Muon$^p$: Muon with Fractional Spectral Powers
Researchers have introduced Muon$^p$, an optimization technique that refines the existing Muon optimizer by using fractional spectral-power updates. This method interpolates between full spectral flattening and standard gradient descent, aiming to preserve valuable singular-value information for better adaptation. Muon$^p$ is particularly effective for fine-tuning large-scale models, showing improvements in validation perplexity and downstream task performance, while maintaining a similar computational complexity to Muon. AI