Transformer models' hierarchical representations are decodable but not always causally used

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 2 sources

Researchers have investigated how transformer models represent hierarchical structures, specifically in tasks involving bracket sequences. They found that while information about hierarchy, such as depth and position, is decodable from the model's internal states, its causal role is not always clear. Experiments showed that masking attention to the top-of-stack position significantly impacted accuracy, whereas altering residual stream subspaces had minimal effect, suggesting decodability does not equate to causal usage. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Investigates the causal role of internal representations in transformers, potentially informing future model interpretability research.

RANK_REASON Academic paper on transformer model interpretability.

Read on arXiv cs.CL →

paper
other

COVERAGE [2]

arXiv cs.CL TIER_1 · Aryan Sharma, Cutter Dawes, Shivam Raval · 2026-04-27 04:00

Dissociating Decodability and Causal Use in Bracket-Sequence Transformers

arXiv:2604.22128v1 Announce Type: new Abstract: When trained on tasks requiring an understanding of hierarchical structure, transformers have been found to represent this hierarchy in distinct ways: in the geometry of the residual stream, and in stack-like attention patterns main…
arXiv cs.CL TIER_1 · Shivam Raval · 2026-04-24 00:26

Dissociating Decodability and Causal Use in Bracket-Sequence Transformers

When trained on tasks requiring an understanding of hierarchical structure, transformers have been found to represent this hierarchy in distinct ways: in the geometry of the residual stream, and in stack-like attention patterns maintaining a last-in, first-out ordering. However, …

COVERAGE [2]

Dissociating Decodability and Causal Use in Bracket-Sequence Transformers

Dissociating Decodability and Causal Use in Bracket-Sequence Transformers

RELATED ENTITIES

RELATED TOPICS