PulseAugur
实时 03:15:02

LionMuon optimizer cuts training cost for large models

Researchers have introduced LionMuon, a novel optimization algorithm designed for efficient training of large-scale models. This method alternates between the low-cost updates of Lion and the stronger, albeit more expensive, spectral updates of Muon. By sharing a single momentum buffer, LionMuon significantly reduces the average iteration cost while maintaining effectiveness. Experiments show LionMuon outperforms existing optimizers like Muon, Lion, Signum, and AdamW across various model sizes and datasets, achieving lower validation loss with less compute. AI

影响 Introduces a new optimization technique that could significantly reduce the computational cost of training large AI models.

排序理由 The cluster contains a new academic paper detailing a novel optimization algorithm for machine learning. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

LionMuon optimizer cuts training cost for large models

报道来源 [1]

  1. arXiv cs.LG TIER_1 English(EN) · Aleksandr Beznosikov ·

    LionMuon: Alternating Spectral and Sign Descent for Efficient Training

    In large-scale optimization, the cheapness and effectiveness of update steps are the most crucial factors for a successful optimizer. Sign-based optimizers like Lion or Signum produce cheap per-step updates, whereas Muon's spectral matrix-sign update gives a much stronger directi…