English(EN) CacheMuon: Using Temporal Preconditioning To Approximate Polar Factor

CacheMuon 通过重用时间预处理数据优化 AI 训练

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-16 04:00

研究人员推出了一种新颖的时间预处理方法 CacheMuon，旨在优化 Muon 优化器中极分解的计算。通过利用这些因子在训练迭代中的时间相关性，CacheMuon 重用先前的信息来近似当前的极分解，从而减少冗余计算。这种方法在计算效率和模型质量之间提供了可控的权衡，在语言模型和视觉训练中，通过最小化验证质量的下降，显著节省了正交化 FLOPs。 AI

影响 CacheMuon 为 AI 训练提供了可控的质量-效率前沿，有望降低语言模型和视觉任务的计算成本。

排序理由这是一篇详细介绍优化 AI 训练过程新方法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.LG TIER_1 English(EN) · Bishnu Dev (Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE), Sushil Bohara (Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE), Martin Tak\'a\v{c} (Mohamed bin Zayed University of Artificial Intelligence, Ab… · 2026-06-16 04:00

CacheMuon: Using Temporal Preconditioning To Approximate Polar Factor

arXiv:2606.16371v1 Announce Type: new Abstract: Muon is an optimizer that computes updates using the polar factor of the momentum matrix and has shown strong empirical performance across a range of training settings. A key component of Muon is the Newton-Schulz iteration used to …

报道来源 [1]

CacheMuon: Using Temporal Preconditioning To Approximate Polar Factor

相关实体

相关话题