PulseAugur
实时 04:12:05

Muon optimizer analysis reveals distinct convergence phases vs. SignSGD

Researchers have analyzed stochastic spectral optimizers, including Muon, in a high-dimensional matrix-valued least squares problem. Their analysis reveals that SignSVD, which Muon approximates, performs a square-root preconditioning with respect to the data covariance spectrum for large batch sizes. In contrast, smaller eigenmodes behave like SGD for small batch sizes, slowing convergence, while SignSGD offers no preconditioning for generic covariance, leading to different optimal learning rates and convergence characteristics. AI

影响 Provides theoretical insights into the behavior of optimization algorithms used in machine learning, potentially guiding future algorithm development.

排序理由 Academic paper analyzing optimization algorithms. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv stat.ML 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

Muon optimizer analysis reveals distinct convergence phases vs. SignSGD

报道来源 [1]

  1. arXiv stat.ML TIER_1 English(EN) · Courtney Paquette ·

    Phases of Muon: When Muon Eclipses SignSGD

    Recently, Muon and related spectral optimizers have demonstrated strong empirical performance as scalable stochastic methods, often outperforming Adam. Yet their behaviour remains poorly understood. We analyze stochastic spectral optimizers, including Muon, on a high-dimensional …