Muon optimizer boosts LLM training efficiency over Adam

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-04 04:00

Researchers have detailed why the Muon optimizer offers improved training efficiency for large language models compared to Adam. Their analysis indicates Muon achieves a greater reduction in loss per step by incurring a smaller penalty related to the curvature of the training landscape. This advantage is primarily due to Muon's lower Normalized Directional Sharpness (NDS), rather than differences in update scale, and is particularly pronounced with imbalanced training data. AI

影响 Explains a key factor in improving LLM training speed and efficiency.

排序理由 Academic paper detailing a novel optimization technique for LLMs. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Shuche Wang, Fengzhuo Zhang, Jiaxiang Li, Dirk Bergemann, Zhuoran Yang · 2026-06-04 04:00

Why Muon Outperforms Adam: A Curvature Perspective

arXiv:2606.04662v1 Announce Type: cross Abstract: Muon improves training efficiency over Adam in large language-model training by about two times, but the local geometric source of this advantage remains unclear. Our work takes a first step toward demystifying Muon's superiority …

报道来源 [1]

Why Muon Outperforms Adam: A Curvature Perspective

相关实体

相关话题