Large-Step Training Dynamics of a Two-Factor Linear Transformer Model
Researchers have analyzed the training dynamics of simplified linear transformer models, specifically focusing on how large learning rates affect convergence. Their study reveals that beyond certain stability thresholds, high learning rates can lead to training attractors that result in cycles, bounded chaos, or divergence, rather than a direct solution. The findings suggest that large constant learning rates can fundamentally alter the learned transformer's behavior, impacting convergence outcomes. AI
IMPACT Reveals how large learning rates can destabilize transformer training, leading to chaotic dynamics instead of convergence.