PulseAugur / Brief
EN
LIVE 12:23:11

Brief

last 24h
[1/1] 223 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. When Attribution Patching Lies: Diagnosis and a Second-Order Correction

    Researchers have developed a new method to improve the accuracy of attribution patching, a technique used to understand how different parts of a language model contribute to its behavior. The current method, a first-order approximation, can be unreliable due to network non-linearities. The new approach introduces a second-order correction using Hessian-vector products, which significantly enhances the fidelity of circuit recovery. This method is computationally feasible for larger models and offers practical tools for detecting untrustworthy estimates and quantifying errors. AI

    IMPACT Improves interpretability of AI models, enabling more reliable circuit identification and debugging.