Linear transformer training dynamics reveal chaotic attractors at high learning rates

By PulseAugur Editorial · [2 sources] · 2026-05-20 15:25

Researchers have analyzed the training dynamics of simplified linear transformer models, specifically focusing on how large learning rates affect convergence. Their study reveals that beyond certain stability thresholds, high learning rates can lead to training attractors that result in cycles, bounded chaos, or divergence, rather than a direct solution. The findings suggest that large constant learning rates can fundamentally alter the learned transformer's behavior, impacting convergence outcomes. AI

IMPACT Reveals how large learning rates can destabilize transformer training, leading to chaotic dynamics instead of convergence.

RANK_REASON The cluster contains an academic paper detailing novel research findings on transformer model training dynamics.

Read on arXiv stat.ML →

paper
other

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Linear transformer training dynamics reveal chaotic attractors at high learning rates

COVERAGE [2]

arXiv stat.ML TIER_1 English(EN) · Krishnakumar Balasubramanian · 2026-05-21 04:00

Large-Step Training Dynamics of a Two-Factor Linear Transformer Model

arXiv:2605.21292v1 Announce Type: new Abstract: Gradient-flow analyses show that simplified linear transformers can learn the in-context linear-regression algorithm, but they do not explain the finite-step behavior of gradient descent at large learning rates. Motivated by empiric…
arXiv stat.ML TIER_1 English(EN) · Krishnakumar Balasubramanian · 2026-05-20 15:25

Large-Step Training Dynamics of a Two-Factor Linear Transformer Model

Gradient-flow analyses show that simplified linear transformers can learn the in-context linear-regression algorithm, but they do not explain the finite-step behavior of gradient descent at large learning rates. Motivated by empirical work on high-learning-rate transformer instab…

COVERAGE [2]

Large-Step Training Dynamics of a Two-Factor Linear Transformer Model

Large-Step Training Dynamics of a Two-Factor Linear Transformer Model

RELATED ENTITIES

RELATED TOPICS