Accelerated Gradient Descent for Faster Convergence with Minimal Overhead
Researchers have introduced OptMuon, a novel adaptive momentum orthogonalization method for stochastic nonconvex optimization that calibrates update magnitudes from observed trajectories. This approach combines Muon-style directions with a trajectory-dependent coefficient schedule, avoiding reliance on smoothness constants or variance levels. OptMuon offers theoretical guarantees for noise adaptivity and zero-noise optimality, reducing to a near-optimal deterministic rate without manual hyperparameter tuning. AI
IMPACT Introduces advanced optimization techniques that could accelerate training and improve performance in large-scale machine learning models.
- Adam
- CT-AGD
- Panayotis Mertikopoulos
- Thomas Nagler
- Zijian Liu
- AdamW
- Krishnakumar Balasubramanian
- SGD
- AdaGrad
- Hugging Face
- arXiv
- BBVI
- Conic Particle Gradient Descent
- AdaGrad-Norm
- Polyak Heavy-Ball
- Wasserstein VI
- Fast Spawn&Prune
- reinforcement learning
- LLM
- Pandora's Box Gittins Index
- MWGraD
- SAC-Opt
- A-MWGraD
- Pareto stationarity
- Wasserstein space
- Bayesian optimization
- RMSProp
- Tikhonov regularization
- logistic regression
- OptMuon
- Muon