PulseAugur
EN
LIVE 05:46:33

Muon optimizer's momentum acts as spectral filter, study finds

Researchers have developed a new theoretical framework to explain the effectiveness of momentum in the Muon optimizer, commonly used for large language model training. Their analysis reveals that momentum acts as a spectral filter, suppressing noise while preserving the primary signal in gradients. This process enlarges the spectral gap, stabilizing the matrix updates and leading to more reliable results. Experiments across various tasks, including LLM pretraining, validate these findings and suggest broader applications for momentum in other matrix-based optimizers. AI

IMPACT Provides theoretical grounding for momentum in LLM training optimizers, potentially improving training stability and efficiency.

RANK_REASON This is a research paper detailing a theoretical analysis and experimental validation of an optimization technique. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.LG TIER_1 English(EN) · Xianliang Li, Zihan Zhang, Weiyang Liu, Han Bao ·

    Denoise First, Orthogonalize Later: Understanding Momentum in Muon via Spectral Filtering

    arXiv:2606.03899v1 Announce Type: new Abstract: Muon has recently demonstrated strong empirical performance in large language model training, but the theoretical role of momentum in Muon remains unclear. Existing analyses of Muon either remove momentum to study spectral updates i…

  2. arXiv cs.LG TIER_1 English(EN) · Han Bao ·

    Denoise First, Orthogonalize Later: Understanding Momentum in Muon via Spectral Filtering

    Muon has recently demonstrated strong empirical performance in large language model training, but the theoretical role of momentum in Muon remains unclear. Existing analyses of Muon either remove momentum to study spectral updates in isolation, or retain momentum without explaini…