Muon optimizer needs less orthogonalization than previously thought

By PulseAugur Editorial · [1 sources] · 2026-06-02 04:00

Researchers have investigated the optimal level of orthogonalization needed for the Muon optimizer, a technique that enhances neural network training by refining momentum updates. Their study utilized a simplified cubic Newton-Schulz schedule to explore the relationship between polar accuracy, spectral shaping, and training performance. The findings indicate that training quality does not strictly correlate with polar decomposition accuracy, as various methods achieved nearly identical final losses on GPT-2 Small and comparable validation losses on larger MoE/Mamba models. AI

IMPACT Suggests a more efficient approach to neural network training, potentially reducing computational costs.

RANK_REASON This is a research paper detailing a new method for optimizing neural network training. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.LG TIER_1 English(EN) · Hua Huang · 2026-06-02 04:00

How Much Orthogonalization Does Muon Need?

arXiv:2606.00371v1 Announce Type: new Abstract: Muon optimizers improve neural-network training by replacing ill-conditioned momentum updates with approximately semi-orthogonal updates. This motivates a practical question: how much orthogonalization does Muon actually require? We…

COVERAGE [1]

How Much Orthogonalization Does Muon Need?

RELATED ENTITIES

RELATED TOPICS