LionMuon optimizer cuts training cost for large models

By PulseAugur Editorial · [1 sources] · 2026-05-19 13:07

Researchers have introduced LionMuon, a novel optimization algorithm designed for efficient training of large-scale models. This method alternates between the low-cost updates of Lion and the stronger, albeit more expensive, spectral updates of Muon. By sharing a single momentum buffer, LionMuon significantly reduces the average iteration cost while maintaining effectiveness. Experiments show LionMuon outperforms existing optimizers like Muon, Lion, Signum, and AdamW across various model sizes and datasets, achieving lower validation loss with less compute. AI

IMPACT Introduces a new optimization technique that could significantly reduce the computational cost of training large AI models.

RANK_REASON The cluster contains a new academic paper detailing a novel optimization algorithm for machine learning. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

paper
infra

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

LionMuon optimizer cuts training cost for large models

COVERAGE [1]

arXiv cs.LG TIER_1 English(EN) · Aleksandr Beznosikov · 2026-05-19 13:07

LionMuon: Alternating Spectral and Sign Descent for Efficient Training

In large-scale optimization, the cheapness and effectiveness of update steps are the most crucial factors for a successful optimizer. Sign-based optimizers like Lion or Signum produce cheap per-step updates, whereas Muon's spectral matrix-sign update gives a much stronger directi…

COVERAGE [1]

LionMuon: Alternating Spectral and Sign Descent for Efficient Training

RELATED ENTITIES

RELATED TOPICS