Brief · PulseAugur

RESEARCH · arXiv cs.LG English(EN) · 3w · [215 sources]

Accelerated Gradient Descent for Faster Convergence with Minimal Overhead

Researchers have introduced OptMuon, a novel adaptive momentum orthogonalization method for stochastic nonconvex optimization that calibrates update magnitudes from observed trajectories. This approach combines Muon-style directions with a trajectory-dependent coefficient schedule, avoiding reliance on smoothness constants or variance levels. OptMuon offers theoretical guarantees for noise adaptivity and zero-noise optimality, reducing to a near-optimal deterministic rate without manual hyperparameter tuning. AI

IMPACT Introduces advanced optimization techniques that could accelerate training and improve performance in large-scale machine learning models.

Adam
CT-AGD
Panayotis Mertikopoulos
Thomas Nagler
Zijian Liu
AdamW
Krishnakumar Balasubramanian
SGD
AdaGrad
Hugging Face
arXiv
BBVI
Conic Particle Gradient Descent
AdaGrad-Norm
Polyak Heavy-Ball
Wasserstein VI
Fast Spawn&Prune
reinforcement learning
LLM
Pandora's Box Gittins Index
MWGraD
SAC-Opt
A-MWGraD
Pareto stationarity
Wasserstein space
Bayesian optimization
RMSProp
Tikhonov regularization
logistic regression
OptMuon
Muon