Muon^2 optimizer boosts foundation model training efficiency

By PulseAugur Editorial · [2 sources] · 2026-06-09 04:00

Researchers have developed Muon$^2$, an enhanced version of the Muon optimizer designed for large-scale foundation model pre-training. Muon$^2$ improves efficiency and quality by incorporating Adam-style adaptive second-moment preconditioning before orthogonalization, addressing the computational costs associated with Muon's iterative orthogonalization process. Experiments with GPT, LLaMA, and Mixture-of-Experts models up to 13B parameters show that Muon$^2$ reduces the need for Newton-Schulz iterations by 40% and can save up to a quarter of training time compared to Muon while achieving similar loss. AI

IMPACT Muon^2 offers a more efficient training process for large foundation models, potentially reducing computational costs and accelerating development cycles.

RANK_REASON The cluster contains two academic papers detailing advancements in optimization algorithms for large-scale model training.

Read on arXiv cs.AI →

paper
infra

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

arXiv cs.AI TIER_1 English(EN) · Ziyue Liu, Ruijie Zhang, Zhengyang Wang, Yequan Zhao, Yupeng Su, Zi Yang, Zheng Zhang · 2026-06-09 04:00

Muon$^2$: Boosting Muon via Adaptive Second-Moment Preconditioning

arXiv:2604.09967v2 Announce Type: replace-cross Abstract: Muon has emerged as a promising optimizer for large-scale foundation model pre-training by exploiting the matrix structure of neural network updates through iterative orthogonalization. However, the orthogonalization quali…
arXiv cs.LG TIER_1 English(EN) · Naoki Sato, Hiroki Naganuma, Hideaki Iiduka · 2026-06-09 04:00

Convergence Bound and Critical Batch Size of Muon Optimizer

arXiv:2507.01598v5 Announce Type: replace Abstract: Muon, a recently proposed optimizer that leverages the inherent matrix structure of neural network parameters, has demonstrated strong empirical performance, indicating its potential as a successor to standard optimizers such as…

COVERAGE [2]

Muon$^2$: Boosting Muon via Adaptive Second-Moment Preconditioning

Convergence Bound and Critical Batch Size of Muon Optimizer

RELATED ENTITIES

RELATED TOPICS