New optimizers adapt to model geometry for better training

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 4 sources

Researchers have developed new adaptive optimization techniques for deep learning models. One paper introduces a data-driven criterion to dynamically select optimal update geometries for neural network layers, interpolating between SGD and the Muon optimizer with minimal runtime overhead. Another paper proposes MiMuon, a hybrid optimizer combining Muon and SGD, which theoretically offers improved generalization error for large models compared to Muon alone, while maintaining similar convergence rates. AI

Summary written by gemini-2.5-flash-lite from 4 sources. How we write summaries →

IMPACT Introduces novel optimization methods that could improve training efficiency and generalization for large AI models.

RANK_REASON Two research papers published on arXiv detailing novel optimization algorithms for deep learning models.

Read on Hugging Face Daily Papers →

paper
other

COVERAGE [4]

arXiv cs.AI TIER_1 · Mathieu Serrurier · 2026-05-19 12:47

From SGD to Muon: Adaptive Optimization via Schatten-p Norms

Modern optimizers, like Muon, impose matrix-wise geometry constraints on their updates. These matrix-wise constraints can be unified under Linear Minimization Oracle (LMO) theory. However, all current methods impose fixed LMO geometries for the update rules, chosen by-design or e…
Hugging Face Daily Papers TIER_1 · 2026-05-19 03:00

Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

Muon is a matrix-aware optimizer that leverages Newton-Schulz (NS) iterations to enforce spectral gradient orthogonalization by driving all singular values of the momentum matrix toward 1. While this uniform spectral whitening enhances exploration and outperforms AdamW in LLM pre…
arXiv stat.ML TIER_1 · Feihu Huang, Yuning Luo, Songcan Chen · 2026-05-20 04:00

MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models

arXiv:2605.19619v1 Announce Type: cross Abstract: Matrix-structured parameters frequently appear in many artificial intelligence models such as large language models. More recently, an efficient Muon optimizer is designed for matrix parameters of large-scale models, and shows mar…
arXiv stat.ML TIER_1 · Songcan Chen · 2026-05-19 09:56

MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models

Matrix-structured parameters frequently appear in many artificial intelligence models such as large language models. More recently, an efficient Muon optimizer is designed for matrix parameters of large-scale models, and shows markedly faster convergence than the vector-wise algo…

COVERAGE [4]

From SGD to Muon: Adaptive Optimization via Schatten-p Norms

Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models

MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models

RELATED ENTITIES

RELATED TOPICS