English(EN) Scaling the Memory of Balanced Adam

新研究改进了 Adam 优化器的内存和噪声动态

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-11 04:00

两篇新研究论文探讨了 Adam 优化器的细微差别，Adam 优化器是深度学习中一个流行的工具。第一篇论文为 Adam 的动量参数提出了一条“刷新规则”，建议它应该随着训练数据的大小进行缩放，以优化不同规模下的性能和鲁棒性。第二篇论文深入研究了小批量噪声（受批量大小和 Adam 超参数的影响）如何影响优化器的隐式偏差和泛化能力，特别是在多周期训练场景中。 AI

影响这些研究为 Adam 优化器提供了理论见解和实用的调优策略，有可能提高各种深度学习任务的模型训练效率和泛化能力。

排序理由两篇发表在 arXiv 上的学术论文，讨论了 Adam 优化器的理论和实验方面。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.LG TIER_1 English(EN) · Enrique S. Quintana-Ortí · 2026-05-11 07:35

平衡 Adam 内存的扩展

Recent evidence suggests that Adam performs robustly when its momentum parameters are tied, $β_1=β_2$, reducing the optimizer to a single remaining parameter. However, the value of this parameter is still poorly understood. We argue that, in balanced Adam, $β$ should not be treat…
arXiv stat.ML TIER_1 English(EN) · Matias D. Cattaneo, Boris Shigida · 2026-05-11 04:00

Mini-Batch 噪声对 Adam 隐式偏差的影响

arXiv:2602.01642v2 Announce Type: replace-cross Abstract: With limited high-quality data and growing compute, multi-epoch training is gaining back its importance across sub-areas of deep learning. Adam(W), versions of which are go-to optimizers for many tasks such as next token p…

报道来源 [2]

平衡 Adam 内存的扩展

Mini-Batch 噪声对 Adam 隐式偏差的影响

相关实体

相关话题