PulseAugur
LIVE 22:20:34
research · [4 sources] ·
2
research

New optimizers adapt to model geometry for better training

Researchers have developed new adaptive optimization techniques for deep learning models. One paper introduces a data-driven criterion to dynamically select optimal update geometries for neural network layers, interpolating between SGD and the Muon optimizer with minimal runtime overhead. Another paper proposes MiMuon, a hybrid optimizer combining Muon and SGD, which theoretically offers improved generalization error for large models compared to Muon alone, while maintaining similar convergence rates. AI

Summary written by gemini-2.5-flash-lite from 4 sources. How we write summaries →

IMPACT Introduces novel optimization methods that could improve training efficiency and generalization for large AI models.

RANK_REASON Two research papers published on arXiv detailing novel optimization algorithms for deep learning models.

Read on Hugging Face Daily Papers →

COVERAGE [4]

  1. arXiv cs.AI TIER_1 · Mathieu Serrurier ·

    From SGD to Muon: Adaptive Optimization via Schatten-p Norms

    Modern optimizers, like Muon, impose matrix-wise geometry constraints on their updates. These matrix-wise constraints can be unified under Linear Minimization Oracle (LMO) theory. However, all current methods impose fixed LMO geometries for the update rules, chosen by-design or e…

  2. Hugging Face Daily Papers TIER_1 ·

    Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

    Muon is a matrix-aware optimizer that leverages Newton-Schulz (NS) iterations to enforce spectral gradient orthogonalization by driving all singular values of the momentum matrix toward 1. While this uniform spectral whitening enhances exploration and outperforms AdamW in LLM pre…

  3. arXiv stat.ML TIER_1 · Feihu Huang, Yuning Luo, Songcan Chen ·

    MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models

    arXiv:2605.19619v1 Announce Type: cross Abstract: Matrix-structured parameters frequently appear in many artificial intelligence models such as large language models. More recently, an efficient Muon optimizer is designed for matrix parameters of large-scale models, and shows mar…

  4. arXiv stat.ML TIER_1 · Songcan Chen ·

    MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models

    Matrix-structured parameters frequently appear in many artificial intelligence models such as large language models. More recently, an efficient Muon optimizer is designed for matrix parameters of large-scale models, and shows markedly faster convergence than the vector-wise algo…