New research explores faster convergence and noise handling in ML optimization

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 13 sources

Researchers have published several papers exploring advanced optimization techniques for machine learning. One paper introduces Curvature-Tuned Accelerated Gradient Descent (CT-AGD), which reduces training epochs by an average of 33% with minimal overhead. Another study investigates the convergence of adaptive gradient methods like AdaGrad under heavy-tailed noise, providing the first provable convergence rate for AdaGrad in non-convex optimization with such noise. Additionally, a paper analyzes the long-run distribution of stochastic gradient descent (SGD), likening it to a Boltzmann-Gibbs distribution where temperature relates to the step-size. AI

Summary written by gemini-2.5-flash-lite from 13 sources. How we write summaries →

IMPACT Advances in optimization methods can lead to more efficient training of machine learning models, reducing computational costs and time.

RANK_REASON Multiple arXiv papers published on optimization techniques for machine learning.

Read on arXiv cs.LG →

paper
other

COVERAGE [13]

arXiv cs.LG TIER_1 · Jalal Etesami · 2026-05-19 11:00

Convergence of Consensus-Based Particle Methods for Nonconvex Bi-Level Optimization

In this paper, we study a consensus-based optimization method for nonconvex bi-level optimization, where the objective is to minimize an upper-level function over the set of global minimizers of a lower-level problem. The proposed approach is derivative-free, and constructs its c…
Hugging Face Daily Papers TIER_1 · 2026-05-19 11:00

Convergence of Consensus-Based Particle Methods for Nonconvex Bi-Level Optimization

In this paper, we study a consensus-based optimization method for nonconvex bi-level optimization, where the objective is to minimize an upper-level function over the set of global minimizers of a lower-level problem. The proposed approach is derivative-free, and constructs its c…
arXiv cs.LG TIER_1 · Frank Liu · 2026-05-15 14:50

Accelerated Gradient Descent for Faster Convergence with Minimal Overhead

In this paper, we present CT-AGD (Curvature-Tuned Accelerated Gradient Descent), an optimization method for non-convex optimization problems in deep learning training tasks. CT-AGD is a general boosting procedure that accelerates first-order methods by explicitly capturing the lo…
arXiv stat.ML TIER_1 · Sharan Sahu, Cameron J. Hogan, Martin T. Wells · 2026-05-20 04:00

On the Provable Suboptimality of Momentum SGD in Nonstationary Stochastic Optimization

arXiv:2601.12238v4 Announce Type: replace Abstract: In this paper, we provide a comprehensive theoretical analysis of Stochastic Gradient Descent (SGD) and its momentum variants (Polyak Heavy-Ball and Nesterov) for tracking time-varying optima under strong convexity and smoothnes…
arXiv stat.ML TIER_1 · Kyurae Kim, Qiang Fu, Yi-An Ma, Jacob R. Gardner, Trevor Campbell · 2026-05-20 04:00

Stochastic Gradient Variational Inference with Price's Gradient Estimator from Bures-Wasserstein to Parameter Space

arXiv:2602.18718v2 Announce Type: replace Abstract: For approximating a target distribution given only its unnormalized log-density, stochastic gradient-based variational inference (VI) algorithms are a popular approach. For example, Wasserstein VI (WVI) and black-box VI (BBVI) p…
arXiv stat.ML TIER_1 · Yohann De Castro (ICJ, ECL, IUF, PSPM), S\'ebastien Gadat (TSE-R, IUF), Cl\'ement Marteau (ICJ, UCBL, PSPM) · 2026-05-20 04:00

Fast Spawn\&Prune (FS\&P): Global convergence of stochastic conic particle gradient descent via birth/death process

arXiv:2605.19784v1 Announce Type: cross Abstract: We investigate the global optimization of the objective function arising in continuous sparse regression, specifically the Beurling LASSO (BLASSO), over the space of measures. While Conic Particle Gradient Descent (CPGD) methods a…
arXiv stat.ML TIER_1 · Clément Marteau · 2026-05-19 12:50

Fast Spawn\&Prune (FS\&P): Global convergence of stochastic conic particle gradient descent via birth/death process

We investigate the global optimization of the objective function arising in continuous sparse regression, specifically the Beurling LASSO (BLASSO), over the space of measures. While Conic Particle Gradient Descent (CPGD) methods are computationally efficient, they may become trap…
arXiv stat.ML TIER_1 · Zijian Liu · 2026-05-19 04:00

Can Adaptive Gradient Methods Converge under Heavy-Tailed Noise? A Case Study of AdaGrad

arXiv:2605.18694v1 Announce Type: cross Abstract: Many tasks in modern machine learning are observed to involve heavy-tailed gradient noise during the optimization process. To manage this realistic and challenging setting, new mechanisms, such as gradient clipping and gradient no…
arXiv stat.ML TIER_1 · Ye He, Krishnakumar Balasubramanian, Sayan Banerjee, Promit Ghosal · 2026-05-19 04:00

Finite-Particle Rates for Regularized Stein Variational Gradient Descent

arXiv:2602.05172v2 Announce Type: replace Abstract: We derive finite-particle rates for the regularized Stein variational gradient descent (R-SVGD) algorithm introduced by He et al. (2024) that corrects the constant-order bias of the SVGD by applying a resolvent-type precondition…
arXiv stat.ML TIER_1 · Tobias Brock, Thomas Nagler · 2026-05-19 04:00

Fast Rates for Nonstationary Weighted Risk Minimization

arXiv:2602.05742v2 Announce Type: replace Abstract: Weighted empirical risk minimization is a common approach to prediction under distribution drift. This article studies its out-of-sample prediction error under nonstationarity. We provide a general decomposition of the excess ri…
arXiv stat.ML TIER_1 · Wa\"iss Azizian, Franck Iutzeler, J\'er\^ome Malick, Panayotis Mertikopoulos · 2026-05-19 04:00

What is the long-run distribution of stochastic gradient descent? A large deviations analysis

arXiv:2406.09241v3 Announce Type: replace-cross Abstract: In this paper, we examine the long-run distribution of stochastic gradient descent (SGD) in general, non-convex problems. Specifically, we seek to understand which regions of the problem's state space are more likely to be…
arXiv stat.ML TIER_1 · Zijian Liu · 2026-05-19 04:00

Clipped Gradient Methods for Nonsmooth Convex Optimization under Heavy-Tailed Noise: A Refined Analysis

arXiv:2512.23178v3 Announce Type: replace-cross Abstract: Optimization under heavy-tailed noise has become popular recently, since it better fits many modern machine learning tasks, as captured by empirical observations. Concretely, instead of a finite second moment on gradient n…
arXiv stat.ML TIER_1 · Zijian Liu · 2026-05-18 17:30

Can Adaptive Gradient Methods Converge under Heavy-Tailed Noise? A Case Study of AdaGrad

Many tasks in modern machine learning are observed to involve heavy-tailed gradient noise during the optimization process. To manage this realistic and challenging setting, new mechanisms, such as gradient clipping and gradient normalization, have been introduced to ensure the co…

COVERAGE [13]

RELATED ENTITIES

RELATED TOPICS