PulseAugur
实时 11:51:25
English(EN) Why Muon Outperforms Adam: A Curvature Perspective

Muon 优化器在特征学习方面优于 Adam

一篇新的研究论文和配套分析探讨了 Muon 优化器相对于 Adam 的性能优势,特别是在大型语言模型和视觉分类器的训练中。研究表明,Muon 学习到的特征更鲁棒、更具可迁移性,在损坏数据上表现更好,并提高了下游任务的可迁移性。这种优越性归因于 Muon 通过在训练后期保持较低的归一化方向锐度来减少曲率惩罚的能力,这种效应被数据不平衡所放大。 AI

影响 Muon 学习更鲁棒、更具可迁移性特征的能力,可能导致未来大型语言模型和 AI 系统的训练更高效、更有效。

排序理由 该集群包含详细介绍 AI 模型优化新研究发现的学术论文。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

报道来源 [3]

  1. arXiv cs.AI TIER_1 English(EN) · Tianyu Ruan, Fengzhuo Zhang, Shuche Wang, Shihua Zhang ·

    Muon 学习到的特征比 Adam 更鲁棒和可迁移

    arXiv:2606.09658v1 Announce Type: cross Abstract: Muon has recently emerged as a state-of-the-art optimizer for pretraining Large Language Models (LLMs) and vision classifiers. Despite its efficiency advantage over Adam and SGD, the feature-learning advantage of Muon remains uncl…

  2. arXiv cs.AI TIER_1 English(EN) · Shihua Zhang ·

    Muon 学习到的特征比 Adam 更鲁棒且更具可迁移性

    Muon has recently emerged as a state-of-the-art optimizer for pretraining Large Language Models (LLMs) and vision classifiers. Despite its efficiency advantage over Adam and SGD, the feature-learning advantage of Muon remains unclear. This paper investigates Muon's feature-learni…

  3. Hugging Face Daily Papers TIER_1 English(EN) ·

    为何 Muon 优于 Adam:曲率视角

    Muon outperforms Adam in large language model training by reducing curvature penalties through lower normalized directional sharpness, particularly in middle and late training stages, with advantages amplified by data imbalance and heterogeneous curvature.