PulseAugur
实时 08:40:46
English(EN) When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds

SignSGD和Muon优化器的性能提升得到理论解释

研究人员从理论上分析了像SignSGD和Muon这样的基于符号的优化算法为何能在训练大型模型时优于标准SGD。一项新研究表明,SignSGD的优势源于其在特定条件下的有效性,例如稀疏噪声和$\\ell_1$-范数平稳性,而标准SGD在处理这些条件时效率不高。另一篇论文质疑了Muon复杂几何结构的必要性,提出像随机或反向谱等更简单的方法可以通过关注局部对齐和下降潜力来实现类似的性能。 AI

影响 为某些优化器为何可能更适合训练大型基础模型提供了理论基础,可能指导未来的研究和开发。

排序理由 该集群包含两篇分析机器学习优化算法的学术论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。 我们如何撰写摘要 →

SignSGD和Muon优化器的性能提升得到理论解释

报道来源 [4]

  1. arXiv cs.LG TIER_1 English(EN) · Hongyi Tao, Dingzhi Yu, Lijun Zhang ·

    When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds

    arXiv:2605.06615v1 Announce Type: new Abstract: Sign-based optimization algorithms, such as SignSGD and Muon, have garnered significant attention for their remarkable performance in training large foundation models. Despite this empirical success, we still lack a theoretical unde…

  2. arXiv cs.AI TIER_1 English(EN) · Lijun Zhang ·

    When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds

    Sign-based optimization algorithms, such as SignSGD and Muon, have garnered significant attention for their remarkable performance in training large foundation models. Despite this empirical success, we still lack a theoretical understanding of when and why these sign-based metho…

  3. arXiv stat.ML TIER_1 English(EN) · Zakhar Shumaylov, Natha\"el Da Costa, Peter Zaika, B\'alint Mucs\'anyi, Alex Massucco, Yoav Gelberg, Carola-Bibiane Sch\"onlieb, Yarin Gal, Philipp Hennig ·

    Muon is Not That Special: Random or Inverted Spectra Work Just as Well

    arXiv:2605.11181v1 Announce Type: cross Abstract: The recent empirical success of the Muon optimizer has renewed interest in non-Euclidean optimization, typically justified by similarities with second-order methods, and linear minimization oracle (LMO) theory. In this paper, we c…

  4. arXiv stat.ML TIER_1 English(EN) · Philipp Hennig ·

    Muon is Not That Special: Random or Inverted Spectra Work Just as Well

    The recent empirical success of the Muon optimizer has renewed interest in non-Euclidean optimization, typically justified by similarities with second-order methods, and linear minimization oracle (LMO) theory. In this paper, we challenge this geometric narrative through three co…