PulseAugur
EN
LIVE 11:15:11

Muon^2 optimizer boosts foundation model training efficiency

Researchers have developed Muon$^2$, an enhanced version of the Muon optimizer designed for large-scale foundation model pre-training. Muon$^2$ improves efficiency and quality by incorporating Adam-style adaptive second-moment preconditioning before orthogonalization, addressing the computational costs associated with Muon's iterative orthogonalization process. Experiments with GPT, LLaMA, and Mixture-of-Experts models up to 13B parameters show that Muon$^2$ reduces the need for Newton-Schulz iterations by 40% and can save up to a quarter of training time compared to Muon while achieving similar loss. AI

IMPACT Muon^2 offers a more efficient training process for large foundation models, potentially reducing computational costs and accelerating development cycles.

RANK_REASON The cluster contains two academic papers detailing advancements in optimization algorithms for large-scale model training.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.AI TIER_1 English(EN) · Ziyue Liu, Ruijie Zhang, Zhengyang Wang, Yequan Zhao, Yupeng Su, Zi Yang, Zheng Zhang ·

    Muon$^2$: Boosting Muon via Adaptive Second-Moment Preconditioning

    arXiv:2604.09967v2 Announce Type: replace-cross Abstract: Muon has emerged as a promising optimizer for large-scale foundation model pre-training by exploiting the matrix structure of neural network updates through iterative orthogonalization. However, the orthogonalization quali…

  2. arXiv cs.LG TIER_1 English(EN) · Naoki Sato, Hiroki Naganuma, Hideaki Iiduka ·

    Convergence Bound and Critical Batch Size of Muon Optimizer

    arXiv:2507.01598v5 Announce Type: replace Abstract: Muon, a recently proposed optimizer that leverages the inherent matrix structure of neural network parameters, has demonstrated strong empirical performance, indicating its potential as a successor to standard optimizers such as…