Muon optimizer shows training efficiency gains over Adam

By PulseAugur Editorial · [2 sources] · 2026-06-03 09:40

A new research paper explores the performance advantages of the Muon optimizer over Adam in large language model training. The study, titled "Why Muon Outperforms Adam: A Curvature Perspective," suggests Muon achieves greater efficiency by incurring a smaller second-order curvature penalty. This advantage is attributed to lower Normalized Directional Sharpness (NDS) rather than differences in update scale, with data imbalance and within-layer curvature playing significant roles. AI

IMPACT Provides a deeper understanding of optimization techniques, potentially leading to more efficient LLM training.

RANK_REASON The cluster contains an academic paper detailing a new perspective on optimizer performance.

Read on arXiv cs.AI →

paper
other

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Muon optimizer shows training efficiency gains over Adam

COVERAGE [2]

arXiv cs.AI TIER_1 English(EN) · Shuche Wang, Fengzhuo Zhang, Jiaxiang Li, Dirk Bergemann, Zhuoran Yang · 2026-06-04 04:00

Why Muon Outperforms Adam: A Curvature Perspective

arXiv:2606.04662v1 Announce Type: cross Abstract: Muon improves training efficiency over Adam in large language-model training by about two times, but the local geometric source of this advantage remains unclear. Our work takes a first step toward demystifying Muon's superiority …
arXiv cs.AI TIER_1 English(EN) · Zhuoran Yang · 2026-06-03 09:40

Why Muon Outperforms Adam: A Curvature Perspective

Muon improves training efficiency over Adam in large language-model training by about two times, but the local geometric source of this advantage remains unclear. Our work takes a first step toward demystifying Muon's superiority over Adam from a curvature perspective. First, we …

COVERAGE [2]

Why Muon Outperforms Adam: A Curvature Perspective

Why Muon Outperforms Adam: A Curvature Perspective

RELATED ENTITIES

RELATED TOPICS