Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR
Researchers have developed several new optimization techniques to improve deep learning model training. AMUSE combines the rapid adaptation of Muon with the stability of Schedule-Free averaging, eliminating the need for learning rate schedules and improving performance across vision and language tasks. Another approach, MiMuon, enhances the generalization capabilities of Muon by blending it with SGD, offering a lower generalization error. Additionally, a new optimizer called Pion addresses Muon's limitations in vision-language-action and reinforcement learning by employing a spectral high-pass filtering mechanism. AI
IMPACT These new optimizers aim to improve training efficiency and generalization for large models, potentially accelerating development in areas like LLMs and robotics.