PulseAugur
EN
LIVE 05:03:06

New optimizer decouples weight magnitude and direction for improved neural network training

Researchers have introduced a novel optimization technique called Magnitude--Direction (MD) Decoupling, which separates the magnitude and direction of weight vectors in neural networks. This method allows for independent learning rates for these two components, potentially leading to more stable and efficient training. MD Decoupling has shown improvements over existing optimizers like Adam and Muon, even on large Mixture-of-Experts models, and removes the need for traditional techniques such as weight decay and warmup. AI

IMPACT Could lead to more stable and efficient training of large neural networks, potentially accelerating research and development.

RANK_REASON Academic paper introducing a novel optimization technique for neural networks. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New optimizer decouples weight magnitude and direction for improved neural network training

COVERAGE [1]

  1. arXiv cs.LG TIER_1 English(EN) · Martin Jaggi ·

    Improving Neural Network Training by Decoupling the Magnitude and Direction of Weight Vectors

    Modern neural network training relies on optimizers such as Adam and Muon which act on each weight matrix as a single object. Yet every weight matrix carries two distinct quantities -- a \emph{magnitude} and a \emph{direction} -- and all optimizers stepping in the matrix as a who…