Brief · PulseAugur

RESEARCH · arXiv cs.AI · 3d · [2 sources]

Every Component is a Lookup: Token Attribution and Composition from a Single Decomposition

Researchers have developed a new method called Unpack to analyze the internal workings of transformer models. This technique uses backward recursion to trace how different components, like attention and MLP layers, contribute to a model's output. Unpack can identify interaction strengths and per-token attributions from a single forward pass, without needing interventions or extra training. AI

IMPACT Provides a novel method for understanding transformer model behavior, potentially aiding in debugging and improving model interpretability.

Pythia
GPT-2 small
Wang et al. (2023)