Adam and Muon optimizers show implicit bias in neural networks

By PulseAugur Editorial · [1 sources] · 2026-05-26 04:00

Researchers have analyzed the implicit bias of momentum-based optimizers like Adam and Muon when applied to smooth homogeneous neural networks. Their findings suggest that algorithms such as momentum steepest descent, including Muon, MomentumGD, and Signum, act as approximate steepest descent trajectories under specific learning rate schedules. This bias leads these algorithms to favor KKT points of the corresponding margin maximization problem, with Adam specifically maximizing the L-infinity margin. AI

IMPACT Provides theoretical insights into optimizer behavior, potentially guiding future model training strategies.

RANK_REASON This is a research paper published on arXiv detailing theoretical analysis and experimental results of optimizers in neural networks. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.LG TIER_1 English(EN) · Eitan Gronich, Gal Vardi · 2026-05-26 04:00

The Implicit Bias of Adam and Muon on Smooth Homogeneous Neural Networks

arXiv:2602.16340v3 Announce Type: replace Abstract: We study the implicit bias of momentum-based optimizers on smooth homogeneous models. We show that \textit{momentum steepest descent} algorithms like Muon (spectral norm), MomentumGD ($\ell_2$ norm), and Signum ($\ell_\infty$ no…

COVERAGE [1]

The Implicit Bias of Adam and Muon on Smooth Homogeneous Neural Networks

RELATED ENTITIES

RELATED TOPICS