English(EN) Open Problem: Is AdamW Effective Under Heavy-Tailed Noise?

开放性问题：AdamW 优化器在大型语言模型 (LLM) 中重尾噪声下的有效性

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-22 17:58

一篇近期论文提出了一个开放性问题，关于 AdamW 优化器在重尾噪声条件下训练大型语言模型 (LLM) 的有效性。尽管 AdamW 被广泛使用，但其理论理解仅限于有限方差场景，尽管有经验证据表明重尾噪声在 LLM 预训练中很常见。该论文探讨了 AdamW 在此环境下是否能够收敛，并将其与其他在重尾噪声下显示出收敛性的优化器（如 Lion 和 Muon）进行了对比，同时提供了一个加权指标基准和一个下界机制。 AI

影响阐明了一个广泛使用的 LLM 训练优化器的理论局限性，可能指导未来对更鲁棒方法的研究。

排序理由该集群包含一篇详细介绍机器学习优化领域开放性问题的学术论文。 [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv stat.ML 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv stat.ML TIER_1 English(EN) · Lijun Zhang · 2026-06-22 17:58

Open Problem: Is AdamW Effective Under Heavy-Tailed Noise?

AdamW is the de facto optimizer for training large language models (LLMs), yet the theory behind it still lives mostly in finite-variance regimes. This is increasingly unsatisfying, as empirical evidence indicates that stochastic gradient noise in LLM pretraining is typically hea…

报道来源 [1]

Open Problem: Is AdamW Effective Under Heavy-Tailed Noise?

相关实体

相关话题