PulseAugur
LIVE 18:43:29
research · [2 sources] ·
4
research

Linear transformer training dynamics reveal chaotic attractors at high learning rates

Researchers have analyzed the training dynamics of simplified linear transformer models, specifically focusing on how large learning rates affect convergence. Their study reveals that beyond certain stability thresholds, high learning rates can lead to training attractors that result in cycles, bounded chaos, or divergence, rather than a direct solution. The findings suggest that large constant learning rates can fundamentally alter the learned transformer's behavior, impacting convergence outcomes. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Reveals how large learning rates can destabilize transformer training, leading to chaotic dynamics instead of convergence.

RANK_REASON The cluster contains an academic paper detailing novel research findings on transformer model training dynamics.

Read on arXiv stat.ML →

COVERAGE [2]

  1. arXiv stat.ML TIER_1 · Krishnakumar Balasubramanian ·

    Large-Step Training Dynamics of a Two-Factor Linear Transformer Model

    arXiv:2605.21292v1 Announce Type: new Abstract: Gradient-flow analyses show that simplified linear transformers can learn the in-context linear-regression algorithm, but they do not explain the finite-step behavior of gradient descent at large learning rates. Motivated by empiric…

  2. arXiv stat.ML TIER_1 · Krishnakumar Balasubramanian ·

    Large-Step Training Dynamics of a Two-Factor Linear Transformer Model

    Gradient-flow analyses show that simplified linear transformers can learn the in-context linear-regression algorithm, but they do not explain the finite-step behavior of gradient descent at large learning rates. Motivated by empirical work on high-learning-rate transformer instab…