Researchers have investigated how transformer models represent hierarchical structures, specifically in tasks involving bracket sequences. They found that while information about hierarchy, such as depth and position, is decodable from the model's internal states, its causal role is not always clear. Experiments showed that masking attention to the top-of-stack position significantly impacted accuracy, whereas altering residual stream subspaces had minimal effect, suggesting decodability does not equate to causal usage. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Investigates the causal role of internal representations in transformers, potentially informing future model interpretability research.
RANK_REASON Academic paper on transformer model interpretability.