Transformer residual streams show geometry of time, concentrate context

By PulseAugur Editorial · [1 sources] · 2026-06-06 20:52

Researchers have discovered that the residual stream in transformers, often likened to working memory, exhibits a distinct geometry related to time. By analyzing the Gemma-2-2B model, they found that information persisting across many tokens concentrates in a low-dimensional subspace, rather than being diffuse. This persistent information is highly sensitive to sequential order, as shuffling tokens drastically reduces the timescale of these slow directions. AI

IMPACT Reveals how transformers might encode temporal information, potentially guiding future model architectures and interpretability methods.

RANK_REASON The cluster contains a research paper detailing experimental findings on transformer model internals. [lever_c_demoted from research: ic=1 ai=1.0]

Read on LessWrong (AI tag) →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Transformer residual streams show geometry of time, concentrate context

COVERAGE [1]

LessWrong (AI tag) TIER_1 English(EN) · Fodenthal · 2026-06-06 20:52

The Residual Stream Has a Geometry of Time

<h1>Preface</h1> <p>This is a preliminary writeup for an experiment on residual stream geometry. The research direction seems pretty underexplored, so I’m posting early to collect objections, research intuitions, and connections to problems other people are thinking about before …

COVERAGE [1]

The Residual Stream Has a Geometry of Time

RELATED ENTITIES

RELATED TOPICS