PulseAugur / Brief
EN
LIVE 11:22:35

Brief

last 24h
[1/1] 223 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. Muon$^2$: Boosting Muon via Adaptive Second-Moment Preconditioning

    Researchers have developed Muon$^2$, an enhanced version of the Muon optimizer designed for large-scale foundation model pre-training. Muon$^2$ improves efficiency and quality by incorporating Adam-style adaptive second-moment preconditioning before orthogonalization, addressing the computational costs associated with Muon's iterative orthogonalization process. Experiments with GPT, LLaMA, and Mixture-of-Experts models up to 13B parameters show that Muon$^2$ reduces the need for Newton-Schulz iterations by 40% and can save up to a quarter of training time compared to Muon while achieving similar loss. AI

    IMPACT Muon^2 offers a more efficient training process for large foundation models, potentially reducing computational costs and accelerating development cycles.