Every Component is a Lookup: Token Attribution and Composition from a Single Decomposition
Researchers have developed a new method called Unpack to analyze the internal workings of transformer models. This technique uses backward recursion to trace how different components, like attention and MLP layers, contribute to a model's output. Unpack can identify interaction strengths and per-token attributions from a single forward pass, without needing interventions or extra training. AI
IMPACT Provides a novel method for understanding transformer model behavior, potentially aiding in debugging and improving model interpretability.