The Implicit Bias of Adam and Muon on Smooth Homogeneous Neural Networks
Researchers have analyzed the implicit bias of momentum-based optimizers like Adam and Muon when applied to smooth homogeneous neural networks. Their findings suggest that algorithms such as momentum steepest descent, including Muon, MomentumGD, and Signum, act as approximate steepest descent trajectories under specific learning rate schedules. This bias leads these algorithms to favor KKT points of the corresponding margin maximization problem, with Adam specifically maximizing the L-infinity margin. AI
IMPACT Provides theoretical insights into optimizer behavior, potentially guiding future model training strategies.