PulseAugur
实时 01:39:16
English(EN) Accelerated Gradient Descent for Faster Convergence with Minimal Overhead

新研究探索机器学习的高级优化技术

几篇近期研究论文探讨了机器学习的高级优化技术。其中一篇论文介绍了一种用于非凸双层优化的无导数共识方法,证明了其均值场和有限粒子近似的收敛保证。另一项研究提出了曲率调整加速梯度下降(CT-AGD),通过捕捉局部曲率,将深度学习任务的训练周期平均减少了33%。此外,研究还探讨了重尾噪声下的随机逼近算法,分析了浓度界限和噪声对误差尾部的影响。其他论文则深入研究了随机梯度变分推断、随机圆锥粒子梯度下降的全局收敛以及非平稳环境中动量SGD的次优性。 AI

影响 优化算法的进步对于提高机器学习模型的效率和性能至关重要。

排序理由 该集群包含多篇关于机器学习优化技术的学术论文。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 43 个来源。 我们如何撰写摘要 →

新研究探索机器学习的高级优化技术

报道来源 [43]

  1. arXiv cs.AI TIER_1 English(EN) · Haoyu Huang, Boyu Liu, Linlin Yang, Yanjing Li, Yuguang Yang, Xuhui Liu, Canyu Chen, Zhongqian Fu, Baochang Zhang ·

    SURGE: Surrogate Gradient Adaptation in Binary Neural Networks

    arXiv:2605.10989v3 Announce Type: replace-cross Abstract: The training of Binary Neural Networks (BNNs) is fundamentally based on gradient approximation for non-differentiable binarization operations (e.g., sign function). However, prevailing methods including the Straight-Throug…

  2. arXiv cs.AI TIER_1 English(EN) · Huangyu Xu, Jingqin Yang, Qianqian Xu, Jiaye Teng ·

    Theoretical Analysis of Sparse Optimization with Reparameterization, Weight Decay, and Adaptive Learning Rate

    arXiv:2605.25134v1 Announce Type: cross Abstract: Sparse optimization is a fundamental challenge in various practical applications. A popular approach to sparse optimization is $\ell_p$ regularization. However, it may encounter optimization instability due to the unbounded gradie…

  3. arXiv cs.AI TIER_1 English(EN) · Chinmay Maheshwari, Chinmay Pimpalkhare, Debasish Chatterjee ·

    EXOTIC: An Exact, Optimistic, Tree-Based Algorithm for Min-Max Optimization

    arXiv:2508.12479v2 Announce Type: replace-cross Abstract: Min-max optimization arises in many domains such as game theory, adversarial machine learning, etc. For these problems, gradient-based methods are well understood and enjoy strong guarantees. However, in the absence of con…

  4. arXiv cs.LG TIER_1 English(EN) · Matan Schliserman, Shira Vansover-Hager, Tomer Koren ·

    Flat Minima and Generalization: Insights from Stochastic Convex Optimization

    arXiv:2511.03548v2 Announce Type: replace Abstract: Understanding the generalization behavior of learning algorithms is a central goal of learning theory. A recently emerging explanation is that learning algorithms are successful in practice because they converge to flat minima, …

  5. arXiv cs.LG TIER_1 English(EN) · Enea Monzio Compagnoni, Rustem Islamov, Frank Norbert Proske, Aurelien Lucchi, Antonio Orvieto, Eduard Gorbunov ·

    On the Interaction of Batch Noise, Adaptivity, and Compression, under $(L_0,L_1)$-Smoothness: An SDE Approach

    arXiv:2506.00181v2 Announce Type: replace Abstract: Distributed stochastic optimization intertwines (i) stochastic gradient noise, (ii) communication compression, and (iii) adaptive/normalized updates. While each factor has been studied in isolation, their joint effect under real…

  6. arXiv cs.LG TIER_1 English(EN) · Jose Blanchet, Peter Glynn, Wenhao Yang ·

    Statistical Inference for Stochastic Gradient Descent Beyond Finite Variance

    arXiv:2605.26000v1 Announce Type: cross Abstract: Stochastic gradient descent (SGD) is a foundational algorithm for large-scale statistical learning and stochastic optimization. However, statistical inference based on SGD iterates remains challenging when stochastic gradients hav…

  7. arXiv cs.LG TIER_1 English(EN) · Khen Cohen, Mark Glass, Meir Feder, Yaron Oz ·

    Implicit Binarization via Complex Phase Dynamics in Combinatorial Optimization

    arXiv:2605.24502v1 Announce Type: cross Abstract: We introduce a physics-inspired continuous relaxation framework that yields substantially improved solutions for NP-hard combinatorial optimization problems, including Quadratic Unconstrained Binary Optimization (QUBO), binary spa…

  8. arXiv cs.LG TIER_1 English(EN) · Chung-Yiu Yau, Dawei Li, Athanasios Glentis, Valentyn Boreiko, Hoi-To Wai, Mingyi Hong ·

    EMA-Nesterov: Stabilizing Nesterov's Lookahead for Accelerated Deep Learning Optimization

    arXiv:2605.25395v1 Announce Type: new Abstract: Lookahead-based acceleration methods, such as Nesterov's momentum, are widely used in optimization, but they often become unreliable in deep learning training mainly due to stochastic gradient noise and non-convex loss landscapes. I…

  9. arXiv cs.LG TIER_1 English(EN) · Yudong W. Xu, Wenhao Li, Xiaoyu Wang, Scott Sanner, Elias B. Khalil ·

    Blocked Gibbs meets Diffusion Transformers: Unsupervised Learning for Constraint Optimization

    arXiv:2605.25129v1 Announce Type: new Abstract: Diffusion models have shown promise in learning to solve constraint optimization problems. However, they are mostly restricted to problems with binary variables and rely on graph neural networks, hindering their application to a bro…

  10. arXiv cs.LG TIER_1 English(EN) · Ziyue Chen, David \v{S}i\v{s}ka, Lukasz Szpruch ·

    Global linear convergence of entropy-regularized softmax policy gradient beyond tabular MDPs

    arXiv:2605.24939v1 Announce Type: new Abstract: We study the global convergence of policy gradient for infinite-horizon entropy-regularized Markov decision processes (MDPs) with continuous state and action spaces. We consider log-linear softmax policies with linear function appro…

  11. arXiv cs.LG TIER_1 English(EN) · Zhuanghua Liu, Luo Luo ·

    Zeroth-Order Nonconvex Nonsmooth Optimization with Heavy-Tailed Noise

    arXiv:2605.24513v1 Announce Type: new Abstract: This paper considers the nonconvex nonsmooth problem in which the objective function is Lipschitz continuous. We focus on the stochastic setting where the algorithm can access stochastic function value evaluations with heavy-tailed …

  12. arXiv cs.AI TIER_1 English(EN) · Chen Liang, Xiatao Sun, Qian Wang, Daniel Rakita ·

    Turning Stale Gradients into Stable Gradients: Coherent Coordinate Descent with Implicit Landscape Smoothing for Lightweight Zeroth-Order Optimization

    arXiv:2605.14373v2 Announce Type: replace-cross Abstract: Zeroth-Order (ZO) optimization is pivotal for scenarios where backpropagation is unavailable, such as memory-constrained on-device learning and black-box optimization. However, existing methods face a stark trade-off: they…

  13. arXiv cs.LG TIER_1 English(EN) · Yequan Zhao, Ruijie Zhang, Liyan Tan, Niall Moran, Tong Qin, Zheng Zhang ·

    FuRA: Full-Rank Parameter-Efficient Fine-Tuning with Spectral Preconditioning

    arXiv:2605.22869v1 Announce Type: new Abstract: Both full fine-tuning (Full FT) and parameter-efficient fine-tuning methods such as LoRA introduce weight updates without accounting for the spectral structure established during pretraining. As a result, noisy gradients from limite…

  14. Hugging Face Daily Papers TIER_1 English(EN) ·

    EMA-Nesterov: Stabilizing Nesterov's Lookahead for Accelerated Deep Learning Optimization

    Lookahead-based acceleration methods, such as Nesterov's momentum, are widely used in optimization, but they often become unreliable in deep learning training mainly due to stochastic gradient noise and non-convex loss landscapes. In particular, standard lookahead relies on short…

  15. arXiv cs.LG TIER_1 English(EN) · Alexander Tyurin ·

    Near-Optimal Convergence of Accelerated Gradient Methods under Generalized and $(L_0, L_1)$-Smoothness

    arXiv:2508.06884v2 Announce Type: replace-cross Abstract: We study first-order methods for convex optimization problems with functions $f$ satisfying the recently proposed $\ell$-smoothness condition $||\nabla^{2}f(x)|| \le \ell\left(||\nabla f(x)||\right),$ which generalizes the…

  16. arXiv cs.LG TIER_1 English(EN) · Ryan Cory-Wright, Jean Pauphilet ·

    Compact Lifted Relaxations for Low-Rank Optimization

    arXiv:2603.20228v2 Announce Type: replace-cross Abstract: We develop tractable convex relaxations for rank-constrained quadratic optimization problems over $n \times m$ matrices, a setting for which tractable relaxations are typically only available when the objective or constrai…

  17. arXiv cs.LG TIER_1 English(EN) · Zhuo Chen (equal contribution), Xinzhe Yuan (equal contribution), Jianshu Zhang (Shanghai Artificial Intelligence Laboratory, Shanghai, China, School of Computer Science, Shanghai Jiao Tong University, Shanghai, China), Jinzong Dong (Shanghai Artificial … ·

    LABO: LLM-Accelerated Bayesian Optimization through Broad Exploration and Selective Experimentation

    arXiv:2605.22054v1 Announce Type: new Abstract: The high cost and data scarcity in scientific exploration have motivated the use of large language models (LLMs) as knowledge-driven components in Bayesian optimization (BO). However, existing approaches typically embed LLMs directl…

  18. Hugging Face Daily Papers TIER_1 English(EN) ·

    Ada2MS: A Hybrid Optimization Algorithm Based on Exponential Mixing of Elementwise and Global Second-Moment Estimates

    Optimization algorithms are core methods by which machine learning models iteratively minimize loss functions, update parameters, learn from data, and improve performance. Momentum SGD and AdamW represent two important optimization paradigms. AdamW produces stable updates and usu…

  19. Hugging Face Daily Papers TIER_1 English(EN) ·

    Convergence of Consensus-Based Particle Methods for Nonconvex Bi-Level Optimization

    In this paper, we study a consensus-based optimization method for nonconvex bi-level optimization, where the objective is to minimize an upper-level function over the set of global minimizers of a lower-level problem. The proposed approach is derivative-free, and constructs its c…

  20. arXiv cs.LG TIER_1 English(EN) · Jalal Etesami ·

    Convergence of Consensus-Based Particle Methods for Nonconvex Bi-Level Optimization

    In this paper, we study a consensus-based optimization method for nonconvex bi-level optimization, where the objective is to minimize an upper-level function over the set of global minimizers of a lower-level problem. The proposed approach is derivative-free, and constructs its c…

  21. arXiv cs.NE (Neural & Evolutionary) TIER_1 English(EN) · Shinichi Shirakawa ·

    Adaptive Stochastic Natural Gradient Method for Safe Optimization on Binary Space

    Optimization problems in real-world applications across the medical and engineering domains often involve potential risks when evaluating candidate solutions. Safe optimization aims to perform optimization while suppressing unsafe solution evaluations in such situations. For cont…

  22. arXiv cs.LG TIER_1 English(EN) · Frank Liu ·

    Accelerated Gradient Descent for Faster Convergence with Minimal Overhead

    In this paper, we present CT-AGD (Curvature-Tuned Accelerated Gradient Descent), an optimization method for non-convex optimization problems in deep learning training tasks. CT-AGD is a general boosting procedure that accelerates first-order methods by explicitly capturing the lo…

  23. arXiv stat.ML TIER_1 English(EN) · Navil Nandhan, Abbas Khademi, Antonio Silveti-Falls ·

    Boosted Stochastic Frank-Wolfe for Constrained Nonconvex Optimization

    arXiv:2605.25255v1 Announce Type: cross Abstract: The boosted Frank-Wolfe algorithm accelerates the classical Frank-Wolfe algorithm by better aligning the update direction with the negative gradient. Its analysis, however, has been limited to deterministic convex problems, with s…

  24. arXiv stat.ML TIER_1 English(EN) · Wenhao Yang ·

    Statistical Inference for Stochastic Gradient Descent Beyond Finite Variance

    Stochastic gradient descent (SGD) is a foundational algorithm for large-scale statistical learning and stochastic optimization. However, statistical inference based on SGD iterates remains challenging when stochastic gradients have infinite variance, as the relevant limiting dist…

  25. arXiv stat.ML TIER_1 English(EN) · Antonio Silveti-Falls ·

    Boosted Stochastic Frank-Wolfe for Constrained Nonconvex Optimization

    The boosted Frank-Wolfe algorithm accelerates the classical Frank-Wolfe algorithm by better aligning the update direction with the negative gradient. Its analysis, however, has been limited to deterministic convex problems, with step sizes that require either line search or knowl…

  26. arXiv stat.ML TIER_1 English(EN) · Krishnakumar Balasubramanian ·

    Finite-Particle Convergence Rates for Conservative and Non-Conservative Drifting Models

    arXiv:2605.22795v1 Announce Type: new Abstract: We propose and analyze a conservative drifting method for one-step generative modeling. The method replaces the original displacement-based drifting velocity by a kernel density estimator (KDE)-gradient velocity, namely the differen…

  27. arXiv cs.CV TIER_1 English(EN) · Gang Dai, Yining Huang, Yiming Xia, Guohao Chen, Shuaicheng Niu ·

    Guided Trajectory Optimization with Sparse Scaling for Test-Time Diffusion

    arXiv:2605.21907v1 Announce Type: new Abstract: The efficient Test-Time Scaling (TTS) paradigm offers a promising perspective for enhancing the generation performance of diffusion models. However, current solutions are limited to a static, pre-defined noise pool and suffer from i…

  28. arXiv stat.ML TIER_1 English(EN) · Krishnakumar Balasubramanian ·

    Finite-Particle Convergence Rates for Conservative and Non-Conservative Drifting Models

    We propose and analyze a conservative drifting method for one-step generative modeling. The method replaces the original displacement-based drifting velocity by a kernel density estimator (KDE)-gradient velocity, namely the difference of the kernel-smoothed data score and the ker…

  29. arXiv stat.ML TIER_1 English(EN) · Tansheng Zhu, Hongyu Zhou, Ke Jin, Xusheng Xu, Qiufan Yuan, Lijie Ji ·

    Bayesian Optimization by Kernel Regression and Density-based Exploration

    arXiv:2502.06178v5 Announce Type: replace-cross Abstract: Bayesian optimization is highly effective for optimizing expensive-to-evaluate black-box functions, but it faces significant computational challenges due to the cubic per-iteration cost of Gaussian processes, which results…

  30. arXiv stat.ML TIER_1 English(EN) · Shubhada Agrawal, Siva Theja Maguluri, Martin Zubeldia ·

    Concentration of General Stochastic Approximation Under Heavy-Tailed Markovian Noise

    arXiv:2605.20999v1 Announce Type: cross Abstract: We establish maximal concentration bounds for the iterates generated by stochastic approximation algorithms with general step sizes, where the noise has a finite-state Markovian component plus a Martingale-difference component. Wh…

  31. arXiv stat.ML TIER_1 Italiano(IT) · Fares El Khoury, Houssam Zenati, Nathan Kallus, Michael Arbel, Aur\'elien Bibaut ·

    Semiparametric Efficient Bilevel Gradient Estimation

    arXiv:2605.21341v1 Announce Type: new Abstract: Functional bilevel methods estimate a lower-level function and plug it into a hypergradient, but this plug-in gradient can retain first-order bias when the lower-level problem is learned nonparametrically. To remove this bias, we de…

  32. arXiv stat.ML TIER_1 Italiano(IT) · Aurélien Bibaut ·

    Semiparametric Efficient Bilevel Gradient Estimation

    Functional bilevel methods estimate a lower-level function and plug it into a hypergradient, but this plug-in gradient can retain first-order bias when the lower-level problem is learned nonparametrically. To remove this bias, we develop a semiparametric debiasing theory for popu…

  33. arXiv stat.ML TIER_1 English(EN) · Martin Zubeldia ·

    Concentration of General Stochastic Approximation Under Heavy-Tailed Markovian Noise

    We establish maximal concentration bounds for the iterates generated by stochastic approximation algorithms with general step sizes, where the noise has a finite-state Markovian component plus a Martingale-difference component. When the Martingale-difference noise is bounded, we …

  34. arXiv stat.ML TIER_1 English(EN) · Kyurae Kim, Qiang Fu, Yi-An Ma, Jacob R. Gardner, Trevor Campbell ·

    Stochastic Gradient Variational Inference with Price's Gradient Estimator from Bures-Wasserstein to Parameter Space

    arXiv:2602.18718v2 Announce Type: replace Abstract: For approximating a target distribution given only its unnormalized log-density, stochastic gradient-based variational inference (VI) algorithms are a popular approach. For example, Wasserstein VI (WVI) and black-box VI (BBVI) p…

  35. arXiv stat.ML TIER_1 English(EN) · Sharan Sahu, Cameron J. Hogan, Martin T. Wells ·

    On the Provable Suboptimality of Momentum SGD in Nonstationary Stochastic Optimization

    arXiv:2601.12238v4 Announce Type: replace Abstract: In this paper, we provide a comprehensive theoretical analysis of Stochastic Gradient Descent (SGD) and its momentum variants (Polyak Heavy-Ball and Nesterov) for tracking time-varying optima under strong convexity and smoothnes…

  36. arXiv stat.ML TIER_1 English(EN) · Yohann De Castro (ICJ, ECL, IUF, PSPM), S\'ebastien Gadat (TSE-R, IUF), Cl\'ement Marteau (ICJ, UCBL, PSPM) ·

    Fast Spawn\&Prune (FS\&P): Global convergence of stochastic conic particle gradient descent via birth/death process

    arXiv:2605.19784v1 Announce Type: cross Abstract: We investigate the global optimization of the objective function arising in continuous sparse regression, specifically the Beurling LASSO (BLASSO), over the space of measures. While Conic Particle Gradient Descent (CPGD) methods a…

  37. arXiv stat.ML TIER_1 English(EN) · Clément Marteau ·

    Fast Spawn\&Prune (FS\&P): Global convergence of stochastic conic particle gradient descent via birth/death process

    We investigate the global optimization of the objective function arising in continuous sparse regression, specifically the Beurling LASSO (BLASSO), over the space of measures. While Conic Particle Gradient Descent (CPGD) methods are computationally efficient, they may become trap…

  38. arXiv stat.ML TIER_1 English(EN) · Wa\"iss Azizian, Franck Iutzeler, J\'er\^ome Malick, Panayotis Mertikopoulos ·

    What is the long-run distribution of stochastic gradient descent? A large deviations analysis

    arXiv:2406.09241v3 Announce Type: replace-cross Abstract: In this paper, we examine the long-run distribution of stochastic gradient descent (SGD) in general, non-convex problems. Specifically, we seek to understand which regions of the problem's state space are more likely to be…

  39. arXiv stat.ML TIER_1 English(EN) · Tobias Brock, Thomas Nagler ·

    Fast Rates for Nonstationary Weighted Risk Minimization

    arXiv:2602.05742v2 Announce Type: replace Abstract: Weighted empirical risk minimization is a common approach to prediction under distribution drift. This article studies its out-of-sample prediction error under nonstationarity. We provide a general decomposition of the excess ri…

  40. arXiv stat.ML TIER_1 English(EN) · Ye He, Krishnakumar Balasubramanian, Sayan Banerjee, Promit Ghosal ·

    Finite-Particle Rates for Regularized Stein Variational Gradient Descent

    arXiv:2602.05172v2 Announce Type: replace Abstract: We derive finite-particle rates for the regularized Stein variational gradient descent (R-SVGD) algorithm introduced by He et al. (2024) that corrects the constant-order bias of the SVGD by applying a resolvent-type precondition…

  41. arXiv stat.ML TIER_1 English(EN) · Zijian Liu ·

    Can Adaptive Gradient Methods Converge under Heavy-Tailed Noise? A Case Study of AdaGrad

    arXiv:2605.18694v1 Announce Type: cross Abstract: Many tasks in modern machine learning are observed to involve heavy-tailed gradient noise during the optimization process. To manage this realistic and challenging setting, new mechanisms, such as gradient clipping and gradient no…

  42. arXiv stat.ML TIER_1 English(EN) · Zijian Liu ·

    Clipped Gradient Methods for Nonsmooth Convex Optimization under Heavy-Tailed Noise: A Refined Analysis

    arXiv:2512.23178v3 Announce Type: replace-cross Abstract: Optimization under heavy-tailed noise has become popular recently, since it better fits many modern machine learning tasks, as captured by empirical observations. Concretely, instead of a finite second moment on gradient n…

  43. arXiv stat.ML TIER_1 English(EN) · Zijian Liu ·

    Can Adaptive Gradient Methods Converge under Heavy-Tailed Noise? A Case Study of AdaGrad

    Many tasks in modern machine learning are observed to involve heavy-tailed gradient noise during the optimization process. To manage this realistic and challenging setting, new mechanisms, such as gradient clipping and gradient normalization, have been introduced to ensure the co…