PulseAugur
EN
LIVE 02:53:00

Transformer residual streams show geometry of time, concentrate context

Researchers have discovered that the residual stream in transformers, often likened to working memory, exhibits a distinct geometry related to time. By analyzing the Gemma-2-2B model, they found that information persisting across many tokens concentrates in a low-dimensional subspace, rather than being diffuse. This persistent information is highly sensitive to sequential order, as shuffling tokens drastically reduces the timescale of these slow directions. AI

IMPACT Reveals how transformers might encode temporal information, potentially guiding future model architectures and interpretability methods.

RANK_REASON The cluster contains a research paper detailing experimental findings on transformer model internals. [lever_c_demoted from research: ic=1 ai=1.0]

Read on LessWrong (AI tag) →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Transformer residual streams show geometry of time, concentrate context

COVERAGE [1]

  1. LessWrong (AI tag) TIER_1 English(EN) · Fodenthal ·

    The Residual Stream Has a Geometry of Time

    <h1>Preface</h1> <p>This is a preliminary writeup for an experiment on residual stream geometry. The research direction seems pretty underexplored, so I’m posting early to collect objections, research intuitions, and connections to problems other people are thinking about before …