PulseAugur
实时 09:44:18
English(EN) Improving Neural Network Training by Decoupling the Magnitude and Direction of Weight Vectors

新型优化器解耦权重幅度和方向,以改进神经网络训练

研究人员引入了一种名为幅度-方向(MD)解耦的新型优化技术,该技术将神经网络中权重向量的幅度和方向分离开来。这种方法允许这两个组件拥有独立的学习率,可能带来更稳定、更高效的训练。MD解耦在大型混合专家模型上甚至比Adam和Muon等现有优化器表现出改进,并且消除了对权重衰减和预热等传统技术的需求。 AI

影响 可能导致更稳定、更高效的大型神经网络训练,从而加速研究和开发。

排序理由 介绍神经网络新型优化技术的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

新型优化器解耦权重幅度和方向,以改进神经网络训练

报道来源 [2]

  1. arXiv cs.LG TIER_1 English(EN) · Alexander H\"agele, Alejandro Hern\'andez-Cano, Atli Kosson, Martin Jaggi ·

    Improving Neural Network Training by Decoupling the Magnitude and Direction of Weight Vectors

    arXiv:2606.25971v1 Announce Type: new Abstract: Modern neural network training relies on optimizers such as Adam and Muon which act on each weight matrix as a single object. Yet every weight matrix carries two distinct quantities -- a \emph{magnitude} and a \emph{direction} -- an…

  2. arXiv cs.LG TIER_1 English(EN) · Martin Jaggi ·

    Improving Neural Network Training by Decoupling the Magnitude and Direction of Weight Vectors

    Modern neural network training relies on optimizers such as Adam and Muon which act on each weight matrix as a single object. Yet every weight matrix carries two distinct quantities -- a \emph{magnitude} and a \emph{direction} -- and all optimizers stepping in the matrix as a who…