Accumulated transformations improve LLM length extrapolation, but degrade at extremes

By PulseAugur Editorial · [1 sources] · 2026-06-25 04:00

Researchers have investigated the extrapolation capabilities of accumulated transformations in attention mechanisms, specifically examining how replacing RoPE's position-indexed rotations with accumulated data-dependent Householder reflections impacts performance. Their findings indicate that while these accumulated transformations improve length extrapolation, performance eventually degrades at extreme context lengths. The study also explores a simpler variant using accumulated token-dependent rotations, which exhibits similar behavior. Theoretical analysis suggests that accumulated orthogonal transformations lead to incoherence after a finite number of steps, limiting attention to distant tokens and creating a finite mixing window. AI

IMPACT Investigates limitations in current attention mechanisms for handling extreme context lengths, potentially guiding future architectural improvements.

RANK_REASON Academic paper detailing theoretical and experimental findings on attention mechanisms. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Accumulated transformations improve LLM length extrapolation, but degrade at extremes

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Mahesh Godavarti · 2026-06-25 04:00

Why Do Accumulated Transformations Extrapolate?

arXiv:2606.24975v1 Announce Type: cross Abstract: PaTH Attention showed that replacing RoPE's position-indexed rotations with accumulated data-dependent Householder reflections yields strong length extrapolation, though performance degrades at extreme context lengths. We ask whet…

COVERAGE [1]

Why Do Accumulated Transformations Extrapolate?

RELATED ENTITIES

RELATED TOPICS