PulseAugur
LIVE 12:25:30
research · [2 sources] ·
0
research

Transformer models' hierarchical representations are decodable but not always causally used

Researchers have investigated how transformer models represent hierarchical structures, specifically in tasks involving bracket sequences. They found that while information about hierarchy, such as depth and position, is decodable from the model's internal states, its causal role is not always clear. Experiments showed that masking attention to the top-of-stack position significantly impacted accuracy, whereas altering residual stream subspaces had minimal effect, suggesting decodability does not equate to causal usage. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Investigates the causal role of internal representations in transformers, potentially informing future model interpretability research.

RANK_REASON Academic paper on transformer model interpretability.

Read on arXiv cs.CL →

COVERAGE [2]

  1. arXiv cs.CL TIER_1 · Aryan Sharma, Cutter Dawes, Shivam Raval ·

    Dissociating Decodability and Causal Use in Bracket-Sequence Transformers

    arXiv:2604.22128v1 Announce Type: new Abstract: When trained on tasks requiring an understanding of hierarchical structure, transformers have been found to represent this hierarchy in distinct ways: in the geometry of the residual stream, and in stack-like attention patterns main…

  2. arXiv cs.CL TIER_1 · Shivam Raval ·

    Dissociating Decodability and Causal Use in Bracket-Sequence Transformers

    When trained on tasks requiring an understanding of hierarchical structure, transformers have been found to represent this hierarchy in distinct ways: in the geometry of the residual stream, and in stack-like attention patterns maintaining a last-in, first-out ordering. However, …