New Muon^p Optimizer Enhances Fine-Tuning of Large Models

By PulseAugur Editorial · [1 sources] · 2026-06-15 04:00

Researchers have introduced Muon$^p$, an optimization technique that refines the existing Muon optimizer by using fractional spectral-power updates. This method interpolates between full spectral flattening and standard gradient descent, aiming to preserve valuable singular-value information for better adaptation. Muon$^p$ is particularly effective for fine-tuning large-scale models, showing improvements in validation perplexity and downstream task performance, while maintaining a similar computational complexity to Muon. AI

RANK_REASON The cluster contains a research paper detailing a new optimization technique for machine learning models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.LG TIER_1 English(EN) · Yihe Dong, Will Sawin · 2026-06-15 04:00

Muon$^p$: Muon with Fractional Spectral Powers

arXiv:2606.13867v1 Announce Type: new Abstract: Muon is an increasingly widely used optimizer that replaces a gradient $G=USV^\top$ with its polar factor $UV^\top$, thereby flattening the singular spectrum. However, full flattening discards singular-value information that may mat…

COVERAGE [1]

Muon$^p$: Muon with Fractional Spectral Powers

RELATED ENTITIES

RELATED TOPICS