PulseAugur
EN
LIVE 10:39:31

Muon optimizer shows training efficiency gains over Adam

A new research paper explores the performance advantages of the Muon optimizer over Adam in large language model training. The study, titled "Why Muon Outperforms Adam: A Curvature Perspective," suggests Muon achieves greater efficiency by incurring a smaller second-order curvature penalty. This advantage is attributed to lower Normalized Directional Sharpness (NDS) rather than differences in update scale, with data imbalance and within-layer curvature playing significant roles. AI

IMPACT Provides a deeper understanding of optimization techniques, potentially leading to more efficient LLM training.

RANK_REASON The cluster contains an academic paper detailing a new perspective on optimizer performance.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Muon optimizer shows training efficiency gains over Adam

COVERAGE [2]

  1. arXiv cs.AI TIER_1 English(EN) · Shuche Wang, Fengzhuo Zhang, Jiaxiang Li, Dirk Bergemann, Zhuoran Yang ·

    Why Muon Outperforms Adam: A Curvature Perspective

    arXiv:2606.04662v1 Announce Type: cross Abstract: Muon improves training efficiency over Adam in large language-model training by about two times, but the local geometric source of this advantage remains unclear. Our work takes a first step toward demystifying Muon's superiority …

  2. arXiv cs.AI TIER_1 English(EN) · Zhuoran Yang ·

    Why Muon Outperforms Adam: A Curvature Perspective

    Muon improves training efficiency over Adam in large language-model training by about two times, but the local geometric source of this advantage remains unclear. Our work takes a first step toward demystifying Muon's superiority over Adam from a curvature perspective. First, we …