新型优化器尊重神经网络对称性，提升训练效果

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-18 09:17

研究人员提出了一种新的深度学习优化器设计原则，该原则与神经网络架构的内在对称性相符。与目前逐坐标操作参数的Adam等优化器不同，所提出的对称性兼容优化器被设计成对不同权重块的特定对称群具有等变性。这种方法已应用于嵌入层、LM头、SwiGLU MLP和MoE路由器等各种组件，产生了新颖的更新规则。在语言模型上的实验表明，与标准的AdamW相比，这些新型优化器在验证损失和训练稳定性方面持续得到改善。 AI

影响引入了新颖的优化器设计，改善了语言模型的训练稳定性和最终验证损失。

排序理由该集群包含一篇学术论文，详细介绍了深度学习优化器设计的新理论原则和实验验证。

在 arXiv stat.ML 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv stat.ML TIER_1 English(EN) · Tim Tsz-Kit Lau, Weijie Su · 2026-05-19 04:00

优化器设计的对称性兼容原理：嵌入、LM头、SwiGLU MLP和MoE路由器

arXiv:2605.18106v1 Announce Type: cross Abstract: A striking geometric disparity has long persisted in the practice of deep learning. While modern neural network architectures naturally exhibit rich symmetry and equivariance properties, popular optimizers such as Adam and its var…
arXiv stat.ML TIER_1 English(EN) · Weijie Su · 2026-05-18 09:17

优化器设计的对称性兼容原理：嵌入、LM头、SwiGLU MLP和MoE路由器

A striking geometric disparity has long persisted in the practice of deep learning. While modern neural network architectures naturally exhibit rich symmetry and equivariance properties, popular optimizers such as Adam and its variants operate inherently coordinate-wise, renderin…

报道来源 [2]

优化器设计的对称性兼容原理：嵌入、LM头、SwiGLU MLP和MoE路由器

优化器设计的对称性兼容原理：嵌入、LM头、SwiGLU MLP和MoE路由器

相关实体

相关话题