New optimizers AMUSE, MiMuon, and Pion enhance deep learning training

By PulseAugur Editorial · [12 sources] · 2026-05-19 00:00

Researchers have developed several new optimization techniques to improve deep learning model training. AMUSE combines the rapid adaptation of Muon with the stability of Schedule-Free averaging, eliminating the need for learning rate schedules and improving performance across vision and language tasks. Another approach, MiMuon, enhances the generalization capabilities of Muon by blending it with SGD, offering a lower generalization error. Additionally, a new optimizer called Pion addresses Muon's limitations in vision-language-action and reinforcement learning by employing a spectral high-pass filtering mechanism. AI

IMPACT These new optimizers aim to improve training efficiency and generalization for large models, potentially accelerating development in areas like LLMs and robotics.

RANK_REASON Multiple research papers introduce novel optimization algorithms for deep learning models.

Read on Hugging Face Daily Papers →

paper
other

AI-generated summary · Google Gemini · from 12 sources. How we write summaries →

New optimizers AMUSE, MiMuon, and Pion enhance deep learning training

COVERAGE [12]

arXiv cs.LG TIER_1 English(EN) · Ben S. Southworth, Shuai Jiang, Daniel McBride, Eric C. Cyr, Stephen Thomas · 2026-05-26 04:00

Muon in Vision Transformers: Optimizer-Recipe Interactions and Gradient Spectra

arXiv:2605.24770v1 Announce Type: new Abstract: Muon is a recently developed matrix-aware optimizer that has shown strong results in transformer training, but its behavior in vision transformers (ViTs) is not yet well understood. We study Muon for ViT training, largely on ImageNe…
arXiv cs.LG TIER_1 English(EN) · Binghui Li, Kaifei Wang, Han Zhong, Pinyan Lu, Liwei Wang · 2026-05-26 04:00

Muon in Associative Memory Learning: Training Dynamics and Scaling Laws

arXiv:2602.05725v2 Announce Type: replace Abstract: Muon updates matrix parameters via the matrix sign of the gradient and has shown strong empirical gains, yet its dynamics and scaling behavior remain unclear in theory. We study Muon in a linear associative memory model with sof…
arXiv cs.AI TIER_1 English(EN) · Fangzhou Wu, Rikhav Shah, Sandeep Silwal, Qiuyi Zhang · 2026-05-25 04:00

DynMuon: A Dynamic Spectral Shaping View of Muon

arXiv:2605.17109v2 Announce Type: replace-cross Abstract: In recent years, Muon has emerged as the dominant method for training large language models, and transformers more broadly. The essential difference, when compared to standard gradient descent methods, is to replace the us…
arXiv cs.AI TIER_1 English(EN) · Tianyu Pang, Yujie Fang, Zihang Liu, Shenyang Deng, Lei Hsiung, Shuhua Yu, Yaoqing Yang · 2026-05-25 04:00

HTMuon: Improving Muon via Heavy-Tailed Spectral Correction

arXiv:2603.10067v2 Announce Type: replace-cross Abstract: Muon has recently shown promising results in LLM training. In this work, we study how to further improve Muon. We argue that Muon's orthogonalized update rule suppresses the emergence of heavy-tailed weight spectra and ove…
arXiv cs.LG TIER_1 English(EN) · Jueun Kim, Baekrok Shin, Jihun Yun, Beomhan Baek, Minhak Song, Chulhee Yun · 2026-05-22 04:00

AMUSE: Anytime Muon with Stable Gradient Evaluation

arXiv:2605.22432v1 Announce Type: new Abstract: Modern deep learning commonly relies on AdamW with prescribed learning rate schedules, but recent works challenge both components: Schedule-Free optimization removes explicit schedules via iterate averaging, and Muon improves the up…
arXiv cs.AI TIER_1 English(EN) · Mathieu Serrurier · 2026-05-19 12:47

From SGD to Muon: Adaptive Optimization via Schatten-p Norms

Modern optimizers, like Muon, impose matrix-wise geometry constraints on their updates. These matrix-wise constraints can be unified under Linear Minimization Oracle (LMO) theory. However, all current methods impose fixed LMO geometries for the update rules, chosen by-design or e…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-19 03:00

Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

Muon is a matrix-aware optimizer that leverages Newton-Schulz (NS) iterations to enforce spectral gradient orthogonalization by driving all singular values of the momentum matrix toward 1. While this uniform spectral whitening enhances exploration and outperforms AdamW in LLM pre…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-19 00:00

Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

Muon's spectral whitening approach in LLM pretraining is replaced by Pion, which uses a high-pass NS iteration to stabilize training in low-rank and low-SNR regimes while maintaining computational efficiency and supporting per-head updates.
arXiv stat.ML TIER_1 English(EN) · Aratrika Mustafi, Soumya Mukherjee, Bharath K. Sriperumbudur · 2026-05-25 04:00

Move on Muon : A Hamiltonian probability gradient flow perspective of Muon optimizer

arXiv:2605.23871v1 Announce Type: new Abstract: We develop a gradient flow on the space of probability measures defined on matrix-valued parameters induced by regularized Muon, an analytically smoothed version of the idealized Muon optimizer. The key observation is that the regul…
arXiv stat.ML TIER_1 English(EN) · Bharath K. Sriperumbudur · 2026-05-22 17:28

Move on Muon : A Hamiltonian probability gradient flow perspective of Muon optimizer

We develop a gradient flow on the space of probability measures defined on matrix-valued parameters induced by regularized Muon, an analytically smoothed version of the idealized Muon optimizer. The key observation is that the regularized orthogonalization map is the gradient of …
arXiv stat.ML TIER_1 English(EN) · Feihu Huang, Yuning Luo, Songcan Chen · 2026-05-20 04:00

MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models

arXiv:2605.19619v1 Announce Type: cross Abstract: Matrix-structured parameters frequently appear in many artificial intelligence models such as large language models. More recently, an efficient Muon optimizer is designed for matrix parameters of large-scale models, and shows mar…
arXiv stat.ML TIER_1 English(EN) · Songcan Chen · 2026-05-19 09:56

MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models

Matrix-structured parameters frequently appear in many artificial intelligence models such as large language models. More recently, an efficient Muon optimizer is designed for matrix parameters of large-scale models, and shows markedly faster convergence than the vector-wise algo…

COVERAGE [12]

RELATED ENTITIES

RELATED TOPICS