English(EN) Denoise First, Orthogonalize Later: Understanding Momentum in Muon via Spectral Filtering

研究发现μ子优化器的动量充当谱滤波器

作者 PulseAugur 编辑部 · [3 个来源] · 2026-06-02 16:54

研究人员发表了一篇论文，详细介绍了μ子优化器（一种用于大型语言模型训练的技术）的理论基础。研究表明，μ子的动量充当谱滤波器，在保留梯度核心信号的同时有效抑制噪声。这个过程会扩大谱隙，稳定矩阵更新，从而获得更可靠的结果，尤其是在动量应用于正交化之前。 AI

影响为LLM训练中使用的优化技术提供了理论基础，有望改进未来的模型开发。

排序理由在arXiv上发表的学术论文，详细介绍了优化技术的理论分析。

AI 生成摘要 · Google Gemini · 来自 3 个来源。我们如何撰写摘要 →

报道来源 [3]

arXiv cs.AI TIER_1 English(EN) · Gagik Magakyan, Pablo Parrilo, Asuman Ozdaglar · 2026-06-04 04:00

Spectral Scaling Laws of Muon

arXiv:2606.04058v1 Announce Type: cross Abstract: Orthonormalized update rules have rapidly become a leading choice of optimizer for training large language models, with recent open-source state-of-the-art models adopting Muon. To keep these updates tractable, Muon performs the o…
arXiv cs.LG TIER_1 English(EN) · Xianliang Li, Zihan Zhang, Weiyang Liu, Han Bao · 2026-06-03 04:00

先去噪，后正交：通过谱滤波理解μ子中的动量

arXiv:2606.03899v1 Announce Type: new Abstract: Muon has recently demonstrated strong empirical performance in large language model training, but the theoretical role of momentum in Muon remains unclear. Existing analyses of Muon either remove momentum to study spectral updates i…
arXiv cs.LG TIER_1 English(EN) · Han Bao · 2026-06-02 16:54

先去噪，后正交：通过谱滤波理解μ子中的动量

Muon has recently demonstrated strong empirical performance in large language model training, but the theoretical role of momentum in Muon remains unclear. Existing analyses of Muon either remove momentum to study spectral updates in isolation, or retain momentum without explaini…