PulseAugur
EN
LIVE 03:18:21

Muown optimizer improves LLM training by controlling row-norm drift

Researchers have developed Muown, a novel optimization method designed to improve the training of large language models. Muown addresses issues with the Muon optimizer, specifically the upward drift of spectral norms in weight matrices during training. By treating row-magnitude vectors as explicit variables, Muown enhances perplexity and learning rate stability across various model scales, outperforming existing optimizers like AdamW and Lion. AI

IMPACT Improves LLM training efficiency and stability, potentially enabling larger models and faster development cycles.

RANK_REASON The cluster contains an academic paper detailing a new optimization method for language model training.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Muown optimizer improves LLM training by controlling row-norm drift

COVERAGE [2]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    Muown: Row-Norm Control for Muon Optimization

    Muon has emerged as a strong competitor to AdamW for language model pre-training, yet its behavior at scale is sensitive to weight decay. Recent work has observed that, for Muon without decoupled weight decay, the spectral norm of weight matrices drifts upward over training. Thro…

  2. arXiv cs.LG TIER_1 English(EN) · Niao He ·

    Muown: Row-Norm Control for Muon Optimization

    Muon has emerged as a strong competitor to AdamW for language model pre-training, yet its behavior at scale is sensitive to weight decay. Recent work has observed that, for Muon without decoupled weight decay, the spectral norm of weight matrices drifts upward over training. Thro…