English(EN) When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds

SignSGD和Muon优化器的性能提升得到理论解释

作者 PulseAugur 编辑部 · [4 个来源] · 2026-05-07 17:32

研究人员从理论上分析了像SignSGD和Muon这样的基于符号的优化算法为何能在训练大型模型时优于标准SGD。一项新研究表明，SignSGD的优势源于其在特定条件下的有效性，例如稀疏噪声和$\\ell_1$-范数平稳性，而标准SGD在处理这些条件时效率不高。另一篇论文质疑了Muon复杂几何结构的必要性，提出像随机或反向谱等更简单的方法可以通过关注局部对齐和下降潜力来实现类似的性能。 AI

影响为某些优化器为何可能更适合训练大型基础模型提供了理论基础，可能指导未来的研究和开发。

排序理由该集群包含两篇分析机器学习优化算法的学术论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。我们如何撰写摘要 →

报道来源 [4]

arXiv cs.LG TIER_1 English(EN) · Hongyi Tao, Dingzhi Yu, Lijun Zhang · 2026-05-08 04:00

何时以及为何 SignSGD 优于 SGD：基于 $\ell_1$-范数下界的理论研究

arXiv:2605.06615v1 Announce Type: new Abstract: Sign-based optimization algorithms, such as SignSGD and Muon, have garnered significant attention for their remarkable performance in training large foundation models. Despite this empirical success, we still lack a theoretical unde…
arXiv cs.AI TIER_1 English(EN) · Lijun Zhang · 2026-05-07 17:32

何时以及为何 SignSGD 优于 SGD：基于 $\ell_1$-范数下界的理论研究

Sign-based optimization algorithms, such as SignSGD and Muon, have garnered significant attention for their remarkable performance in training large foundation models. Despite this empirical success, we still lack a theoretical understanding of when and why these sign-based metho…
arXiv stat.ML TIER_1 English(EN) · Zakhar Shumaylov, Natha\"el Da Costa, Peter Zaika, B\'alint Mucs\'anyi, Alex Massucco, Yoav Gelberg, Carola-Bibiane Sch\"onlieb, Yarin Gal, Philipp Hennig · 2026-05-13 04:00

Muon 并非特别之处：随机或反转光谱效果同样好

arXiv:2605.11181v1 Announce Type: cross Abstract: The recent empirical success of the Muon optimizer has renewed interest in non-Euclidean optimization, typically justified by similarities with second-order methods, and linear minimization oracle (LMO) theory. In this paper, we c…
arXiv stat.ML TIER_1 English(EN) · Philipp Hennig · 2026-05-11 19:42

Muon 并非特别之处：随机或反转光谱效果同样好

The recent empirical success of the Muon optimizer has renewed interest in non-Euclidean optimization, typically justified by similarities with second-order methods, and linear minimization oracle (LMO) theory. In this paper, we challenge this geometric narrative through three co…

报道来源 [4]

何时以及为何 SignSGD 优于 SGD：基于 $\ell_1$-范数下界的理论研究

何时以及为何 SignSGD 优于 SGD：基于 $\ell_1$-范数下界的理论研究

Muon 并非特别之处：随机或反转光谱效果同样好

Muon 并非特别之处：随机或反转光谱效果同样好

相关实体

相关话题