English(EN) Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

新的优化器 AMUSE、MiMuon 和 Pion 增强深度学习训练

作者 PulseAugur 编辑部 · [12 个来源] · 2026-05-19 00:00

研究人员开发了几种新的优化技术来改进深度学习模型的训练。AMUSE 将 Muon 的快速适应性与无计划平均的稳定性相结合，无需学习率计划即可提高视觉和语言任务的性能。另一种方法 MiMuon 通过将其与 SGD 融合来增强 Muon 的泛化能力，提供更低的泛化误差。此外，一种名为 Pion 的新优化器通过采用频谱高通滤波机制，解决了 Muon 在视觉-语言-动作和强化学习中的局限性。 AI

影响这些新的优化器旨在提高大型模型的训练效率和泛化能力，有可能加速 LLM 和机器人等领域的发展。

排序理由多篇研究论文介绍了深度学习模型的新型优化算法。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 12 个来源。我们如何撰写摘要 →

报道来源 [12]

arXiv cs.LG TIER_1 English(EN) · Ben S. Southworth, Shuai Jiang, Daniel McBride, Eric C. Cyr, Stephen Thomas · 2026-05-26 04:00

Muon in Vision Transformers: Optimizer-Recipe Interactions and Gradient Spectra

arXiv:2605.24770v1 Announce Type: new Abstract: Muon is a recently developed matrix-aware optimizer that has shown strong results in transformer training, but its behavior in vision transformers (ViTs) is not yet well understood. We study Muon for ViT training, largely on ImageNe…
arXiv cs.LG TIER_1 English(EN) · Binghui Li, Kaifei Wang, Han Zhong, Pinyan Lu, Liwei Wang · 2026-05-26 04:00

Muon in Associative Memory Learning: Training Dynamics and Scaling Laws

arXiv:2602.05725v2 Announce Type: replace Abstract: Muon updates matrix parameters via the matrix sign of the gradient and has shown strong empirical gains, yet its dynamics and scaling behavior remain unclear in theory. We study Muon in a linear associative memory model with sof…
arXiv cs.AI TIER_1 English(EN) · Fangzhou Wu, Rikhav Shah, Sandeep Silwal, Qiuyi Zhang · 2026-05-25 04:00

DynMuon：μ子的一种动态光谱整形视角

arXiv:2605.17109v2 Announce Type: replace-cross Abstract: In recent years, Muon has emerged as the dominant method for training large language models, and transformers more broadly. The essential difference, when compared to standard gradient descent methods, is to replace the us…
arXiv cs.AI TIER_1 English(EN) · Tianyu Pang, Yujie Fang, Zihang Liu, Shenyang Deng, Lei Hsiung, Shuhua Yu, Yaoqing Yang · 2026-05-25 04:00

HTMuon：通过重尾谱校正改进Muon

arXiv:2603.10067v2 Announce Type: replace-cross Abstract: Muon has recently shown promising results in LLM training. In this work, we study how to further improve Muon. We argue that Muon's orthogonalized update rule suppresses the emergence of heavy-tailed weight spectra and ove…
arXiv cs.LG TIER_1 English(EN) · Jueun Kim, Baekrok Shin, Jihun Yun, Beomhan Baek, Minhak Song, Chulhee Yun · 2026-05-22 04:00

AMUSE: Anytime Muon with Stable Gradient Evaluation

arXiv:2605.22432v1 Announce Type: new Abstract: Modern deep learning commonly relies on AdamW with prescribed learning rate schedules, but recent works challenge both components: Schedule-Free optimization removes explicit schedules via iterate averaging, and Muon improves the up…
arXiv cs.AI TIER_1 English(EN) · Mathieu Serrurier · 2026-05-19 12:47

从SGD到Muon：通过Schatten-p范数实现自适应优化

Modern optimizers, like Muon, impose matrix-wise geometry constraints on their updates. These matrix-wise constraints can be unified under Linear Minimization Oracle (LMO) theory. However, all current methods impose fixed LMO geometries for the update rules, chosen by-design or e…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-19 03:00

重新思考Muon的预训练之外：VLA和RLVR的频谱故障与高通量补救方法

Muon is a matrix-aware optimizer that leverages Newton-Schulz (NS) iterations to enforce spectral gradient orthogonalization by driving all singular values of the momentum matrix toward 1. While this uniform spectral whitening enhances exploration and outperforms AdamW in LLM pre…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-19 00:00

重新思考Muon的预训练之外：VLA和RLVR的谱失败与高通量补救

Muon's spectral whitening approach in LLM pretraining is replaced by Pion, which uses a high-pass NS iteration to stabilize training in low-rank and low-SNR regimes while maintaining computational efficiency and supporting per-head updates.
arXiv stat.ML TIER_1 English(EN) · Aratrika Mustafi, Soumya Mukherjee, Bharath K. Sriperumbudur · 2026-05-25 04:00

Move on Muon：Muon优化器的哈密顿概率梯度流视角

arXiv:2605.23871v1 Announce Type: new Abstract: We develop a gradient flow on the space of probability measures defined on matrix-valued parameters induced by regularized Muon, an analytically smoothed version of the idealized Muon optimizer. The key observation is that the regul…
arXiv stat.ML TIER_1 English(EN) · Bharath K. Sriperumbudur · 2026-05-22 17:28

Move on Muon：从哈密顿概率梯度流看Muon优化器

We develop a gradient flow on the space of probability measures defined on matrix-valued parameters induced by regularized Muon, an analytically smoothed version of the idealized Muon optimizer. The key observation is that the regularized orthogonalization map is the gradient of …
arXiv stat.ML TIER_1 English(EN) · Feihu Huang, Yuning Luo, Songcan Chen · 2026-05-20 04:00

MiMuon：具有改进泛化能力的大模型混合μ子优化器

arXiv:2605.19619v1 Announce Type: cross Abstract: Matrix-structured parameters frequently appear in many artificial intelligence models such as large language models. More recently, an efficient Muon optimizer is designed for matrix parameters of large-scale models, and shows mar…
arXiv stat.ML TIER_1 English(EN) · Songcan Chen · 2026-05-19 09:56

MiMuon：具有改进泛化能力的大模型混合μ子优化器

Matrix-structured parameters frequently appear in many artificial intelligence models such as large language models. More recently, an efficient Muon optimizer is designed for matrix parameters of large-scale models, and shows markedly faster convergence than the vector-wise algo…

报道来源 [12]

相关实体

相关话题