新的OptMuon方法通过自适应动量增强随机优化

arXiv cs.AI TIER_1 English(EN) · Yash Vardhan Tomar, Dheeraj Peddireddy, Vaneet Aggarwal · 2026-06-12 04:00

SymQNet：低延迟自适应哈密顿量学习的摊销获取

arXiv:2606.12808v1 Announce Type: cross Abstract: Adaptive Hamiltonian learning is central to calibrating and characterizing quantum devices. In an adaptive controller, choosing the next experiment is itself a computation. Bayesian design rules are recomputed after every posterio…

arXiv cs.LG TIER_1 English(EN) · Meher Bhaskar · 2026-06-11 17:11

Simplex-Constrained Sparse Bagging：从均匀先验到集成学习中的稀疏后验的转变

We present Simplex-Constrained Sparse Bagging (SCSB), a mathematically rigorous framework for post-training compression and probability calibration of bootstrap-based bagging ensembles. Standard bagging ensembles (such as Random Forests, Bagged SVMs, and Bagged Neural Networks) a…

arXiv cs.LG TIER_1 English(EN) · Handi Zhang, Adrienne M. Propp, Brooks Kinch, Houman Owhadi, Nathaniel Trask · 2026-06-11 04:00

具有可处理不确定性量化的结构保持神经代理模型

arXiv:2606.11650v1 Announce Type: new Abstract: Recent advances in scientific machine learning provide a means of near-real-time solution to partial differential equations (PDEs), but lack the theoretical underpinnings of conventional simulators that support contemporary verifica…

arXiv cs.CL TIER_1 English(EN) · Yucheng Li, Huiqiang Jiang, Yang Xu, Jianxin Yang, Yi Zhang, Yizhong Cao, Yuhao Shen, Fan Zhou, Rui Men, Jianwei Zhang, An Yang, Bowen Yu, Bo Zheng, Fei Huang, Junyang Lin, Dayiheng Liu, Jingren Zhou · 2026-06-11 04:00

打破熵界限：通过拒绝采样加速 MTP 强化学习训练

arXiv:2606.12370v1 Announce Type: cross Abstract: Reinforcement learning (RL) has become a key component in modern large language models, yet the rollout stage remains the key bottleneck in RL training pipelines. Although Multi-Token Prediction (MTP) offers a natural solution to …

arXiv cs.LG TIER_1 English(EN) · Shira Vansover-Hager, Matan Schliserman, Ofir Schlisselberg, Tomer Koren · 2026-06-11 04:00

超越欧氏稳定性的镜像下降：初始化敏感性方面的指数级分离

arXiv:2606.11431v1 Announce Type: new Abstract: Mirror Descent (MD) extends Gradient Descent (GD) beyond Euclidean geometry and has recently reappeared as a lens for KL-regularized policy optimization in reinforcement learning and LLM post-training. This raises a basic robustness…

arXiv cs.LG TIER_1 English(EN) · Benjamin Leblanc, Louis-Jacob Lebel, Teddy Kana, Richard Kamel · 2026-06-11 04:00

Simplicity Suffices for Parameter Noise Injection in Stochastic Gradient Descent

arXiv:2606.12054v1 Announce Type: new Abstract: Injecting noise into the optimization process is a well-established technique for improving the training and generalization of deep neural networks. Yet, despite the breadth of existing approaches, it remains unclear which design ch…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-11 02:07

SymQNet：低延迟自适应哈密顿量学习的摊销获取

Adaptive Hamiltonian learning is central to calibrating and characterizing quantum devices. In an adaptive controller, choosing the next experiment is itself a computation. Bayesian design rules are recomputed after every posterior update, and that step can take seconds. Across h…

arXiv cs.CL TIER_1 English(EN) · Jingren Zhou · 2026-06-10 17:36

打破熵界限：通过拒绝采样加速 MTP 强化学习训练

Reinforcement learning (RL) has become a key component in modern large language models, yet the rollout stage remains the key bottleneck in RL training pipelines. Although Multi-Token Prediction (MTP) offers a natural solution to accelerate rollouts through speculative decoding, …

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-10 17:36

打破熵界限：通过拒绝采样加速 MTP 强化学习训练

Reinforcement learning (RL) has become a key component in modern large language models, yet the rollout stage remains the key bottleneck in RL training pipelines. Although Multi-Token Prediction (MTP) offers a natural solution to accelerate rollouts through speculative decoding, …

arXiv cs.LG TIER_1 English(EN) · Richard Kamel · 2026-06-10 13:19

Simplicity Suffices for Parameter Noise Injection in Stochastic Gradient Descent

Injecting noise into the optimization process is a well-established technique for improving the training and generalization of deep neural networks. Yet, despite the breadth of existing approaches, it remains unclear which design choices truly matter in practice. In this work, we…

arXiv cs.AI TIER_1 English(EN) · Ruinan Wang, Ian Nabney, Mohammad Golbabaee · 2026-06-10 04:00

面向高维超参数优化的重要性感知调度

arXiv:2606.10068v1 Announce Type: cross Abstract: Hyperparameter Optimization (HPO) is essential for building high-performing ML/DL models, yet conventional optimizers often struggle in high-dimensional spaces where evaluations are costly and progress is diluted across many low-i…

arXiv cs.LG TIER_1 English(EN) · Mingchen Ma, Guyang Cao, Jelena Diakonikolas, Ilias Diakonikolas · 2026-06-10 04:00

高效学习具有Massart噪声的漂移半空间

arXiv:2606.11149v1 Announce Type: new Abstract: We study the problem of learning a drifting concept in the presence of Massart noise. In this framework, an online learner has access to a history of independent samples whose labels are noisy versions of a target concept that may c…

arXiv cs.LG TIER_1 English(EN) · Ryo Sagawa, Daisuke Furihata, Yuto Miyatake · 2026-06-10 04:00

通过随机低秩Hessian近似加速SAV优化

arXiv:2606.10562v1 Announce Type: cross Abstract: We propose a new optimization method, the Nystr\"om-enhanced relaxed scalar auxiliary variable method (N-RSAV), which incorporates curvature information into the RSAV framework to accelerate convergence while preserving an uncondi…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-10 00:00

打破熵界限：通过拒绝采样加速 MTP 强化学习训练

Bebop addresses the efficiency bottleneck in reinforcement learning training of large language models by optimizing multi-token prediction techniques through entropy-aware sampling and novel training objectives that improve acceptance rates and inference throughput.

arXiv cs.LG TIER_1 English(EN) · Ilias Diakonikolas · 2026-06-09 17:35

高效学习带噪声的漂移半空间

We study the problem of learning a drifting concept in the presence of Massart noise. In this framework, an online learner has access to a history of independent samples whose labels are noisy versions of a target concept that may change from round to round. The goal is to output…

arXiv cs.LG TIER_1 English(EN) · Jared Lawrence, Ari Kalinsky, Hannah Bradfield, Yair Carmon, Oliver Hinder · 2026-06-09 04:00

无参数随机凸优化的样本复杂度

arXiv:2506.11336v2 Announce Type: replace Abstract: We study the sample complexity of stochastic convex optimization when problem parameters such as the distance to optimality and the Lipschitz constant are unknown. We pursue two strategies. First, we develop a reliable model sel…

arXiv cs.AI TIER_1 English(EN) · Prayas Agrawal, Prateek Chanda, Ishita Khatri, Ganesh Ramakrishnan, Bamdev Mishra, Pratik Jawanpuria · 2026-06-09 04:00

通过分区拟阵约束梯度匹配进行小批量选择

arXiv:2606.07954v1 Announce Type: cross Abstract: Training large language models (LLMs) on heterogeneous data requires selecting minibatches that balance convergence speed with coverage across domains. Existing methods either select samples independently within each domain or rel…

arXiv cs.AI TIER_1 English(EN) · St\'ephane Eilles-Chan Way, Hugo Percot, Quentin Cappart, Tias Guns, Louis-Martin Rousseau · 2026-06-09 04:00

将决策导向的学习扩展到大型问题与拉格朗日分解

arXiv:2606.08797v1 Announce Type: cross Abstract: Decision-focused learning has shown great promise for addressing predict-then-optimize problems, particularly in the presence of under-specified models. However, its practical deployment is often hindered by high computational cos…

arXiv cs.AI TIER_1 English(EN) · Nico Daheim, Thomas M\"ollenhoff, Ming Liang Ang, Mohammad Emtiyaz Khan · 2026-06-09 04:00

SVRG及后续修正

arXiv:2512.01930v2 Announce Type: replace-cross Abstract: Stochastic Variance Reduced Gradient (SVRG) and its variants aim to speed-up training by using gradient corrections. Originally proposed over a decade ago, these methods have never been connected to any Bayesian method at …

arXiv cs.LG TIER_1 English(EN) · Liping Tao, Xindi Tong, Chee Wei Tan · 2026-06-09 04:00

通过可微分编程进行学习优化

arXiv:2601.16510v3 Announce Type: replace-cross Abstract: Solving massive-scale optimization problems requires scalable first-order methods with low per-iteration cost. This tutorial highlights a shift in optimization: using differentiable programming not only to execute algorith…

arXiv cs.LG TIER_1 English(EN) · Wentao Zhang, Yutong Zhang, Wentao Mo · 2026-06-09 04:00

在线凸优化中的噪声自适应高概率遗憾界限

arXiv:2606.08028v1 Announce Type: new Abstract: We study high-probability regret bounds for online convex optimization (OCO) with strongly convex losses and establish three results that resolve open questions at the intersection of noise adaptivity, feedback structure, and constr…

arXiv cs.LG TIER_1 English(EN) · Binh Nguyen, Trinh Tran, Truong X. Nghiem · 2026-06-09 04:00

LEAF：一个学习增强的ADMM框架，用于加速凸优化

arXiv:2606.08993v1 Announce Type: new Abstract: We propose LEAF, a learning-enabled ADMM framework for accelerated convex optimization. The key idea is to approximate the Moreau envelope of the objective function using an Input Convex Neural Network (ICNN), resulting in a learned…

arXiv cs.LG TIER_1 English(EN) · Francesco Bullo · 2026-06-09 04:00

通过近端梯度进行具有贝叶斯先验的预测编码

arXiv:2606.08374v1 Announce Type: cross Abstract: We recast predictive coding as continuous-time proximal gradient descent applied to a regularized maximum-a-posteriori (MAP) objective. We study first a single-level problem and then a multi-level hierarchy. For the single-level p…

arXiv cs.LG TIER_1 English(EN) · Ganzhao Yuan · 2026-06-09 04:00

OptMuon：用于具有零噪声最优性的随机优化的闭环正交动量方法

arXiv:2606.08783v1 Announce Type: cross Abstract: Orthogonalized momentum updates, as used in Muon-style optimizers, have recently shown strong empirical stability in large-scale deep learning. However, existing orthogonalized methods are typically paired with constant or open-lo…

arXiv cs.AI TIER_1 English(EN) · Munsik Kim · 2026-06-08 04:00

Wasserstein空间中平滑变化的分布的暂空时极小极大速率

arXiv:2606.07325v1 Announce Type: cross Abstract: We study the minimax rate of estimating a future value $\mu_{t_n+h}$ of a curve $t\mapsto\mu_t$ in the $2$-Wasserstein space $\mathcal{P}_2(\mathbb{R}^d)$ from finitely many noisy snapshots of its past, under an adiabatic bound $\…

arXiv cs.LG TIER_1 English(EN) · Eshed Gal, Samy Wu Fung, Eldad Haber · 2026-06-08 04:00

概率高斯同伦：非凸优化的概率空间连续性框架

arXiv:2603.13546v2 Announce Type: replace Abstract: We introduce Probabilistic Gaussian Homotopy (PGH), a probability-space continuation framework for nonconvex optimization. Unlike classical Gaussian homotopy, which smooths the objective and uniformly averages gradients, PGH def…

arXiv cs.LG TIER_1 English(EN) · Ming Sun, Kun Yuan · 2026-06-08 04:00

强凸优化加速去中心化随机梯度下降法

arXiv:2606.07496v1 Announce Type: new Abstract: Decentralized stochastic optimization is a fundamental paradigm for large-scale learning over networks, where agents communicate only with their neighbors and no central coordinator is required. For strongly convex problems, communi…

arXiv cs.LG TIER_1 English(EN) · Rohan Shravan · 2026-06-08 04:00

可逆基础：通过状态保持缩放训练 120B 稀疏 MoE

arXiv:2606.07404v1 Announce Type: new Abstract: This paper reports on training a hundred-billion-parameter sparse mixture of experts on a single eight-GPU node, end to end. LightningLM 0.1V is a recurrence-backbone language model family grown in four stages from a small dense see…

arXiv cs.LG TIER_1 English(EN) · Alma Rahat, Tinkle Chugh, Jonathan Fieldsend, Richard Allmendinger · 2026-06-08 04:00

通过预测梯度催化剂加速多目标贝叶斯优化

arXiv:2606.06984v1 Announce Type: new Abstract: This paper presents a general acceleration mechanism for multi-objective Bayesian optimisation (MOBO) that leverages Gaussian process predictive gradients as auxiliary signals. Rather than replacing existing Pareto-compliant acquisi…

arXiv cs.LG TIER_1 English(EN) · Leonardo Galli, Curtis Fox, Wiebke Bartolomaeus, Mark Schmidt, Holger Rauhut · 2026-06-08 04:00

Flatland: 大步长梯度下降的冒险之旅

arXiv:2606.06722v1 Announce Type: new Abstract: The training of neural networks often entails objective functions that are not globally $L$-smooth. For these functions, it is both theoretically and practically difficult to reply to the question: what is the largest possible step …

arXiv cs.AI TIER_1 English(EN) · Qingyue Zhang, Chang Chu, Tianren Peng, Qi Li, Xiangyang Luo, Zhihao Jiang, Shao-Lun Huang · 2026-06-08 04:00

LoRA-DA：通过渐近分析实现低秩自适应的数据感知初始化

arXiv:2510.24561v3 Announce Type: replace-cross Abstract: LoRA has become a widely adopted method for PEFT, and its initialization methods have attracted increasing attention. However, existing methods have notable limitations: many methods do not incorporate target-domain data, …

arXiv cs.LG TIER_1 English(EN) · Francesco Bullo · 2026-06-06 23:41

通过近邻梯度实现具有贝叶斯先验的预测编码

We recast predictive coding as continuous-time proximal gradient descent applied to a regularized maximum-a-posteriori (MAP) objective. We study first a single-level problem and then a multi-level hierarchy. For the single-level problem, we show that proximal gradient descent is …

arXiv cs.AI TIER_1 English(EN) · Merve Karakas, Christopher J. Williams, Emmanuel O. Balogun, Sadegh Sadeghi Tabas, Christian Brown, Nikhil Rao · 2026-06-06 04:00

用于约束优化中子空间预处理的多重残差网络

arXiv:2606.06300v1 Announce Type: new Abstract: We propose MResOpt, a staged residual neural network architecture for constrained optimization problems. Our architecture fits within predict-complete-correct pipelines and decomposes constraint satisfaction by priority through inte…

arXiv cs.LG TIER_1 English(EN) · Kun Yuan · 2026-06-05 17:51

强凸优化加速去中心化随机梯度下降法

Decentralized stochastic optimization is a fundamental paradigm for large-scale learning over networks, where agents communicate only with their neighbors and no central coordinator is required. For strongly convex problems, communication efficiency is mainly determined by the co…

arXiv cs.LG TIER_1 English(EN) · Rohan Shravan · 2026-06-05 15:48

可逆基础：通过状态保持缩放训练 120B 稀疏 MoE

This paper reports on training a hundred-billion-parameter sparse mixture of experts on a single eight-GPU node, end to end. LightningLM 0.1V is a recurrence-backbone language model family grown in four stages from a small dense seed, through a 5B and a 9B mixture of experts, to …

arXiv cs.AI TIER_1 English(EN) · Munsik Kim · 2026-06-05 14:43

Wasserstein空间中光滑变化的分布的暂空最小最大速率

We study the minimax rate of estimating a future value $μ_{t_n+h}$ of a curve $t\mapstoμ_t$ in the $2$-Wasserstein space $\mathcal{P}_2(\mathbb{R}^d)$ from finitely many noisy snapshots of its past, under an adiabatic bound $\|\nabla_t^k v\|\le\varepsilon$ on the $k$-th covariant…

arXiv cs.LG TIER_1 English(EN) · Disi Lin, Martin Berggren, Tommy L\"ofstedt · 2026-06-05 04:00

Generalized TV--$\ell_p$ Structured Priors for Bayesian $T_1$ Mapping

arXiv:2606.05381v1 Announce Type: new Abstract: We propose an extended family of structured spatial priors that incorporates the total variation (TV) function with $\ell_p$ norms. The prior is proven to be proper and incorporated into a Bayesian regression framework to enable unc…

arXiv cs.LG TIER_1 English(EN) · Dongruo Zhou · 2026-06-05 04:00

高阶光滑非凸优化的尖锐一阶下界

arXiv:2606.05438v1 Announce Type: new Abstract: We study the deterministic first-order oracle complexity of finding $\epsilon$-stationary points in smooth nonconvex optimization when the objective satisfies higher-order smoothness assumptions. While the classical \(\epsilon^{-2…

arXiv cs.LG TIER_1 English(EN) · Christian Coester, Alexa Tudose, Alexander Turoczy · 2026-06-05 04:00

带双重预测的学习增强在线最小化

arXiv:2606.05380v1 Announce Type: cross Abstract: We present learning-augmented algorithms for two general classes of online minimization problems: metrical task systems and laminar set cover. Both algorithms achieve improved theoretical guarantees using machine-learned predictio…

arXiv cs.LG TIER_1 English(EN) · Andrea Martin, Ian R. Manchester, Luca Furieri · 2026-06-05 04:00

带保证的学习优化：线性收敛算法的完整表征

arXiv:2508.00775v2 Announce Type: replace-cross Abstract: The design of many classical optimization algorithms is driven by the certification of linear convergence rates over classes of optimization problems. In this paper, we consider the problem of improving the average-case pe…

arXiv cs.LG TIER_1 English(EN) · Mikhail Persiianov, Arip Asadulaev, Nikita Andreev, Nikita Starodubcev, Dmitry Baranchuk, Anastasis Kratsios, Evgeny Burnaev, Alexander Korotin · 2026-06-05 04:00

逆熵最优传输通过数据似然最大化解决半监督学习问题

arXiv:2410.02628v5 Announce Type: replace Abstract: Learning conditional distributions $\pi^*(\cdot|x)$ is a central problem in machine learning, which is typically approached via supervised methods with paired data $(x,y) \sim \pi^*$. However, acquiring paired data samples is of…

arXiv cs.AI TIER_1 English(EN) · Nikhil Rao · 2026-06-04 15:37

用于约束优化中子空间预处理的多重残差网络

We propose MResOpt, a staged residual neural network architecture for constrained optimization problems. Our architecture fits within predict-complete-correct pipelines and decomposes constraint satisfaction by priority through intermediate re-completion and stage-aware losses. T…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-04 10:10

TD(0) 结合线性函数逼近、通用学习步长和独立同分布样本的快速鲁棒收敛率

In this paper, we study the finite-time behavior of the TD(0) temporal-difference method with linear function approximation (LFA). We consider on-policy independent and identically distributed (i.i.d.) samples, a constant learning step, and the Polyak-Juditsky averaging method. W…

arXiv cs.LG TIER_1 English(EN) · Aleksandar Armacki, Dragana Bajovi\'c, Du\v{s}an Jakoveti\'c, Soummya Kar, Ali H. Sayed · 2026-06-04 04:00

非凸优化中（裁剪）SGD的长期尾部衰减

arXiv:2602.05657v2 Announce Type: replace Abstract: The study of tail behaviour of SGD-induced processes has been attracting a lot of interest, due to offering strong guarantees with respect to individual runs of an algorithm. While many works provide high-probability guarantees,…

arXiv cs.LG TIER_1 English(EN) · Julius Durmann, Amelie Kleber · 2026-06-04 04:00

基于均值的算法：下界与遗憾

arXiv:2606.04931v1 Announce Type: new Abstract: Mean-based algorithms are a class of online learning algorithms that assign low probability to actions with low average rewards. Recent work indicates these algorithms converge favorably to serially undominated actions, which approx…

arXiv cs.CL TIER_1 English(EN) · Rishit Dagli, Abir Harrasse, Luke Zhang, Florent Draye, Amirali Abdullah, Bernhard Sch\"olkopf, Zhijing Jin · 2026-06-04 04:00

STRIDE：通过子集扰动稀疏恢复进行训练数据归因

arXiv:2606.05165v1 Announce Type: cross Abstract: Training Data Attribution (TDA) seeks to trace a model's predictions back to its training data. The gold standard for TDA relies on causal interventions, observing how a model changes when data is added or removed, but repeated re…

arXiv cs.LG TIER_1 English(EN) · Zhijing Jin · 2026-06-03 17:59

STRIDE：通过子集扰动稀疏恢复进行训练数据归因

Training Data Attribution (TDA) seeks to trace a model's predictions back to its training data. The gold standard for TDA relies on causal interventions, observing how a model changes when data is added or removed, but repeated retraining is computationally challenging for Large …

arXiv cs.LG TIER_1 English(EN) · Amelie Kleber · 2026-06-03 14:23

基于均值的算法：下界与遗憾

Mean-based algorithms are a class of online learning algorithms that assign low probability to actions with low average rewards. Recent work indicates these algorithms converge favorably to serially undominated actions, which approximate Nash equilibria in economic games. However…

arXiv cs.LG TIER_1 English(EN) · Luo Luo, Xue Cui, Tingkai Jia, Cheng Chen · 2026-06-03 04:00

去中心化随机非凸优化在 $(L_0,L_1)$-光滑性下

arXiv:2509.08726v3 Announce Type: replace-cross Abstract: This paper focuses on the decentralized stochastic optimization problem $f(\mathbf{x})=\frac{1}{m}\sum_{i=1}^m f_i(\mathbf{x})$ over a connected network of $n$ agents, where each local function has the form of $f_i(\mathbf…

arXiv cs.LG TIER_1 English(EN) · Moses Charikar, Chirag Pabbaraju, Ambuj Tewari · 2026-06-03 04:00

从非凸到强凸：在线优化的曲率自适应FTPL

arXiv:2606.02948v1 Announce Type: new Abstract: Curvature adaptivity is a classical theme in online optimization: for convex Lipschitz losses, adaptive methods interpolate between the optimal $O(\sqrt{T})$ regret for general convex losses and $O(\log T)$ regret under strong conve…

arXiv cs.AI TIER_1 English(EN) · Han Fang, Paul Weng, Yutong Ban · 2026-06-03 04:00

ASAP：利用神经组合优化中的满意泛化优势

arXiv:2501.17377v4 Announce Type: replace-cross Abstract: Deep Reinforcement Learning (DRL) has emerged as a promising approach for solving Combinatorial Optimization (CO) problems, such as the 3D Bin Packing Problem (3D-BPP), Traveling Salesman Problem (TSP), or Vehicle Routing …

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-03 00:00

STRIDE：通过子集扰动稀疏恢复进行训练数据归因

STRIDE framework enables efficient training data attribution for LLMs by modeling functional effects in activation space through sparse recovery and steering operators, achieving superior speed and accuracy compared to traditional gradient-based methods.

arXiv cs.LG TIER_1 English(EN) · Yue Wu, Weiqiang Zheng, Yang Cai, Haipeng Luo · 2026-06-02 04:00

通过幂律步长加速最小-最大优化

arXiv:2606.01764v1 Announce Type: cross Abstract: We revisit the convergence guarantees of the Extragradient (EG) method for unconstrained biaffine min-max optimization. It is known that EG with a fixed stepsize achieves a $\Theta(T^{-1/2})$ last-iterate convergence rate, which i…

arXiv cs.LG TIER_1 English(EN) · Nicholas Knight · 2026-06-02 04:00

低秩架构的黎曼梯度下降

arXiv:2606.02328v1 Announce Type: new Abstract: We explore Riemannian optimization techniques for rank-factored matrix parameters, targeting contemporary deep learning applications. We examine ten points in the algorithm design space: two geometries for rank-$r$ matrices, three g…

arXiv cs.LG TIER_1 English(EN) · Matthew Regehr, Gautam Kamath, Andrew Lowy · 2026-06-02 04:00

平滑强凸损失函数的近乎最优纯粹机器学习遗忘

arXiv:2606.01527v1 Announce Type: new Abstract: Machine unlearning is motivated by legal and user-facing requirements to remove the influence of individuals' data from trained models, such as the right to be forgotten. Prior work has developed algorithms and error bounds for unle…

arXiv cs.LG TIER_1 English(EN) · Gishnu Madhu, Feng Liu, Souma Chowdhury · 2026-06-02 04:00

基于学习的有向图抽象组合空间用于混合组合非线性优化中的保序搜索

arXiv:2606.01425v1 Announce Type: new Abstract: Mixed-combinatorial nonlinear programming (MCNLP) problems arise in many engineering design and planning applications, e.g., due to categorical, component, and geometric design choices, as well as joint task and motion planning. Tra…

arXiv cs.LG TIER_1 English(EN) · Shion Takeno · 2026-06-02 04:00

贝叶斯优化最优点方差缩减及其遗憾界限保证

arXiv:2606.00956v1 Announce Type: new Abstract: This paper studies a one-step lookahead Bayesian optimization (BO) method and its theoretical guarantee. Although the empirical effectiveness of one-step lookahead BO methods, such as entropy search, has been studied extensively, th…

arXiv cs.LG TIER_1 English(EN) · Bing Liu, Wenjie Zhou, Chengcheng Zhao · 2026-06-02 04:00

重新思考 Kronecker 因子优化器中的 Bregman 散度

arXiv:2606.00542v1 Announce Type: new Abstract: Shampoo-style optimizers approximate gradient covariance matrices using Kronecker-factored structures. Recent work~\cite{lin2026understanding} showed that such approximations can be viewed as projections under Bregman matrix diverge…

arXiv cs.AI TIER_1 English(EN) · Shuhei Watanabe, Frank Hutter · 2026-06-02 04:00

c-TPE：带不等式约束的树状 Parzen 估计器用于昂贵的超参数优化

arXiv:2211.14411v5 Announce Type: replace-cross Abstract: Hyperparameter optimization (HPO) is crucial for strong performance of deep learning algorithms and real-world applications often impose some constraints, such as on memory usage or latency, on top of the performance requi…

arXiv cs.AI TIER_1 English(EN) · Mohammad Rashed, Duarte F. Valoroso Madeira, Babak Gholami, Caglar Guerbuez, Yunjia Yang, Nils Thuerey · 2026-06-02 04:00

关于拓扑优化中的泛化性：通过敏感性条件化的伯努利流匹配

arXiv:2606.02179v1 Announce Type: cross Abstract: Surrogate models for topology optimization (TO) exhibit highly variable out-of-distribution (OOD) generalization under distribution shifts such as changing loads or boundary conditions, yet the source of this variability remains u…

arXiv cs.AI TIER_1 English(EN) · Munsik Kim · 2026-06-02 04:00

信息论下界：通过规约到压缩高斯均值估计实现比特约束随机优化

arXiv:2606.00703v1 Announce Type: cross Abstract: Low-precision pretraining (FP8, MXFP4, NVFP4) is now standard for frontier language models, yet the literature is almost entirely achievability -- algorithms and empirical scaling laws -- with no matching characterization of what …

arXiv cs.AI TIER_1 English(EN) · Dongjun Kim, Adrian de Wynter, Huancheng Chen, Heasung Kim, Haris Vikalo · 2026-06-02 04:00

通过广义瑞利商优化实现保持基础的适应

arXiv:2606.00132v1 Announce Type: cross Abstract: While finetuning effectively adapts foundation models to specialized downstream tasks, it can degrade nontarget capabilities acquired during pretraining. Existing forgetting aware methods typically seek safer updates through speci…

arXiv cs.AI TIER_1 English(EN) · Yi-Xiang Hu · 2026-06-02 04:00

立场文件：决策引擎的后解鲁棒性：扰动下的可行区域与平滑性

arXiv:2606.00002v1 Announce Type: new Abstract: Mixed-Integer Linear Programming (MILP) decision engines routinely output nominally optimal plans for high-stakes industrial systems. Yet deployment rarely matches solve-time assumptions: small perturbations in costs, demands, or re…

arXiv cs.LG TIER_1 English(EN) · Chengfeng Wu, Tao Zou, Yanru Wu, Jingge Wang · 2026-06-02 04:00

CORE-MTL：通过因果正交表示重新思考梯度平衡

arXiv:2606.02221v1 Announce Type: cross Abstract: Multi-task learning (MTL) aims to construct a joint model for multiple tasks by sharing a common representation across domains. To achieve this goal, existing optimization-centric methods either balance task gradients or modify th…

arXiv cs.LG TIER_1 English(EN) · Minduli Wijayatunga, Roberto Armellin · 2026-06-02 04:00

用于求解 J2 扰动朗伯问题的微型递归模型

arXiv:2606.00895v1 Announce Type: cross Abstract: This paper presents a fast, recursive neural solver for the J2-perturbed Lambert problem based on Tiny Recursive Models (TRM), termed the TRM-Perturbed Lambert (TRM-PL) model. TRM is a weight-shared architecture whose effective ca…

arXiv cs.LG TIER_1 Deutsch(DE) · Dingzhi Yu, Wei Jiang, Hongyi Tao, Yuanyu Wan, Lijun Zhang · 2026-06-02 04:00

广义光滑下的镜像下降

arXiv:2502.00753v4 Announce Type: replace-cross Abstract: Smoothness is crucial for attaining fast rates in first-order optimization. However, many optimization problems in modern machine learning involve non-smooth objectives. Recent studies relax the smoothness assumption by al…

arXiv cs.LG TIER_1 English(EN) · Edwige Cyffers, Alireza Mirrokni, Marco Mondelli · 2026-06-02 04:00

Optimal Regularization for Performative Learning

arXiv:2510.12249v2 Announce Type: replace Abstract: In performative learning, the data distribution reacts to the deployed model - for example, because strategic users adapt their features to game it - which creates a more complex dynamic than in classical supervised learning. On…

arXiv cs.LG TIER_1 English(EN) · Jiayu Zhang, Tianyi Lin · 2026-06-02 04:00

尺度不变神经网络优化：范数几何与重尾噪声

arXiv:2605.18528v2 Announce Type: replace-cross Abstract: A growing lesson from neural network optimization is that optimizer design should respect how the model is parametrized. Scale-invariant methods become important because their normalized layerwise updates can not only supp…

arXiv cs.LG TIER_1 English(EN) · Nicholas Knight · 2026-06-01 14:40

黎曼梯度下降用于低秩架构

We explore Riemannian optimization techniques for rank-factored matrix parameters, targeting contemporary deep learning applications. We examine ten points in the algorithm design space: two geometries for rank-$r$ matrices, three geometries for rank-$r$ partial isometries, and b…

arXiv cs.LG TIER_1 English(EN) · Jingge Wang · 2026-06-01 13:20

CORE-MTL：通过因果正交表示重新思考梯度平衡

Multi-task learning (MTL) aims to construct a joint model for multiple tasks by sharing a common representation across domains. To achieve this goal, existing optimization-centric methods either balance task gradients or modify the shared architecture. However, as these approache…

arXiv cs.AI TIER_1 English(EN) · Nils Thuerey · 2026-06-01 12:36

关于拓扑优化中的泛化性：通过敏感性条件化的伯努利流匹配

Surrogate models for topology optimization (TO) exhibit highly variable out-of-distribution (OOD) generalization under distribution shifts such as changing loads or boundary conditions, yet the source of this variability remains unclear. We hypothesize that OOD performance is gov…

arXiv cs.AI TIER_1 English(EN) · Zeou Hu, Kelvin Ho, Yaoliang Yu · 2026-06-01 04:00

多目标优化中梯度聚合的统一框架

arXiv:2605.30452v1 Announce Type: cross Abstract: Many machine learning problems involve multiple inherent trade-offs that are best addressed by gradient-based multi-objective optimization (MOO) algorithms. Existing methods are often proposed with various motivations, analyzed ca…

arXiv cs.AI TIER_1 English(EN) · Yansen Zhang, Qingcan Kang, Yujie Chen, Yufei Wang, Xiongwei Han, Tao Zhong, Mingxuan Yuan, Chen Ma · 2026-06-01 04:00

SAC-Opt：优化建模中的语义锚点迭代校正

arXiv:2510.05115v3 Announce Type: replace Abstract: Large language models (LLMs) have opened new paradigms in optimization modeling by enabling the generation of executable solver code from natural language descriptions. Despite this promise, existing approaches typically remain …

arXiv cs.LG TIER_1 English(EN) · Sharan Vaswani, Yifan Sun, Reza Babanezhad · 2026-06-01 04:00

最陡下降法与Adam在非均匀光滑性下的收敛性

arXiv:2605.30648v1 Announce Type: new Abstract: Recent work has analyzed the convergence of first-order methods under non-uniform smoothness assumptions that better model the loss landscape in machine learning tasks. We generalize this assumption to objectives whose curvature is …

arXiv cs.LG TIER_1 English(EN) · Abhishek Chakraborty, Angelia Nedi\'c · 2026-06-01 04:00

带自适应步长的约束优化随机可行性方法

arXiv:2601.20076v2 Announce Type: replace-cross Abstract: We consider minimizing an objective function subject to constraints defined by the intersection of lower-level sets of convex functions. We study two cases: (i) strongly convex and Lipschitz-smooth objective function and (…

arXiv cs.LG TIER_1 English(EN) · Shengyu Feng, Tarun Suresh, Yiming Yang · 2026-06-01 04:00

无监督组合优化扩散求解器通过组合伴随匹配实现

arXiv:2605.30920v1 Announce Type: new Abstract: Diffusion-based neural solvers have shown strong promise for combinatorial optimization (CO), but existing methods typically rely on supervised training with large collections of near-optimal solutions. In this work, we extend adjoi…

arXiv cs.LG TIER_1 English(EN) · Junbin Qiu, Zhaowei Hong, Renzhe Xu, Yao Shu · 2026-06-01 04:00

重新审视零阶 Hessian 近似：单步策略优化视角

arXiv:2605.30960v1 Announce Type: new Abstract: Accurate Zeroth-Order (ZO) Hessian estimation is a cornerstone of derivative-free methods, essential for tasks such as bilevel optimization, Bayesian inference, and uncertainty quantification. However, obtaining a complete suite of …

arXiv cs.LG TIER_1 English(EN) · Zihao Chen · 2026-06-01 04:00

通过算子侧Tikhonov正则化实现锚定的统一视图

arXiv:2605.30905v1 Announce Type: cross Abstract: Anchored fixed point and monotone equation methods, including Halpern iteration, extra anchored gradient, and their relatives, add a vanishing pull toward a reference point to obtain last-iterate guarantees. Existing anchored vari…

arXiv cs.LG TIER_1 English(EN) · Ferhat Erata, Orr Paradise, Thanos Typaldos, Timos Antonopoulos, ThanhVu Nguyen, Shafi Goldwasser, Ruzica Piskac · 2026-06-01 04:00

学习随机归约

arXiv:2412.18134v4 Announce Type: replace Abstract: Randomized self-reductions (RSRs) express $f(x)$ using $f$ evaluated at random correlated points, enabling self-correcting programs, instance-hiding protocols, and applications in complexity theory and cryptography. Yet discover…

arXiv cs.LG TIER_1 English(EN) · Qian Xie, Linda Cai, Alexander Terenin, Peter I. Frazier, Ziv Scully · 2026-06-01 04:00

面向贝叶斯优化的成本感知停止策略

arXiv:2507.12453v5 Announce Type: replace Abstract: In automated machine learning, scientific discovery, and other applications of Bayesian optimization, deciding when to stop evaluating expensive black-box functions in a cost-aware manner is an important but underexplored practi…

arXiv cs.LG TIER_1 English(EN) · Dai Hai Nguyen, Duc Dung Nguyen, Atsuyoshi Nakamura, Hiroshi Mamitsuka · 2026-06-01 04:00

加速多重 Wasserstein 梯度流用于多目标分布优化

arXiv:2601.19220v2 Announce Type: replace Abstract: We study multi-objective optimization over probability distributions in Wasserstein space. Recently, Nguyen et al. (2025) introduced Multiple Wasserstein Gradient Descent (MWGraD) algorithm, which exploits the geometric structur…

arXiv cs.LG TIER_1 English(EN) · Yaohong Yang, Sammie Katt, Samuel Kaski · 2026-06-01 04:00

通过自适应ε约束分解实现多目标贝叶斯优化

arXiv:2604.15959v2 Announce Type: replace Abstract: Multi-objective Bayesian optimization (MOBO) provides a principled framework for optimizing multiple expensive black-box functions. However, existing MOBO methods often struggle with coverage, scalability, and handling constrain…

arXiv cs.LG TIER_1 English(EN) · Hua Li · 2026-05-29 04:00

梯度扰动：学习扰动梯度以实现自适应训练

arXiv:2605.29494v1 Announce Type: new Abstract: Deep neural network training involves both forward propagation (from features through logits to loss) and backward propagation (from loss through gradients to parameter updates). While perturbations along the forward chain, includin…

arXiv cs.LG TIER_1 English(EN) · Shutong Ding, Yimiao Zhou, Ke Hu, Xi Yao, Junchi Yan, Xiaoying Tang, Ye Shi · 2026-05-29 04:00

面向带权重自助精炼的约束非凸优化扩散式学习框架

arXiv:2502.10330v4 Announce Type: replace Abstract: Recent advances in diffusion models show promising potential to accelerate nonconvex problem solving by leveraging their multimodality. However, most existing diffusion-based optimization approaches rely on supervised learning a…

arXiv cs.LG TIER_1 English(EN) · Jisung Hwang, Minhyuk Sung · 2026-05-29 04:00

用于高效可靠奖励引导生成的梯度预处理

arXiv:2602.08646v2 Announce Type: replace Abstract: We propose a gradient preconditioning method that makes reward-guided generation with one-step generative models both efficient and reliable. Test-time noise optimization can unlock substantially better reward-guided generations…

arXiv cs.LG TIER_1 English(EN) · Th\'eotime Le Hellard, Franki Nguimatsia Tiofack, Quentin Le Lidec, Justin Carpentier · 2026-05-29 04:00

使用Sobolev训练的扩散策略加速轨迹优化

arXiv:2604.19011v2 Announce Type: replace Abstract: Trajectory Optimization (TO) solvers exploit known system dynamics to compute locally optimal trajectories through iterative improvements. A downside is that each new problem instance is solved independently; therefore, converge…

arXiv cs.AI TIER_1 English(EN) · Ruoran Xu, Borong She, Xiaobo Jin, Qiufeng Wang · 2026-05-29 04:00

通过随机几何探测实现奇点感知优化：迈向稳定的非光滑优化

arXiv:2605.29547v1 Announce Type: cross Abstract: Deep learning optimization relies heavily on the assumption of smooth loss landscapes, a condition systematically violated by modern architectures due to non-smooth components such as ReLU activations and quantization operators. I…

arXiv cs.LG TIER_1 English(EN) · Luxuan Li, Chunfeng Cui, Xiao Wang · 2026-05-29 04:00

MoSSP：一种基于动量的单循环随机惩罚方法，用于非凸约束的DC正则化优化

arXiv:2605.29635v1 Announce Type: cross Abstract: In this paper, we study a structured class of nonconvex constrained stochastic problems with difference-of-convex (DC) regularization, where the feasible set is possibly nonconvex and the concave part of the DC regularizer is allo…

arXiv cs.LG TIER_1 English(EN) · Zitao Song, Cedar Site Bai, Zhe Zhang, Brian Bullins, David F. Gleich · 2026-05-28 04:00

解耦自适应梯度下降中的方差与尺度不变性更新，实现向量与矩阵优化统一

arXiv:2602.06880v2 Announce Type: replace Abstract: Adaptive methods like Adam have become the $\textit{de facto}$ standard for large-scale vector and Euclidean optimization due to their coordinate-wise adaptation with a second-order nature. More recently, matrix-based spectral o…

arXiv cs.LG TIER_1 English(EN) · Ivan Bioli, Carlo Marcati, Giancarlo Sangalli · 2026-05-28 04:00

利用随机数值线性代数加速物理信息神经网络的自然梯度下降

arXiv:2505.11638v4 Announce Type: replace-cross Abstract: Natural Gradient Descent (NGD) has emerged as a promising optimization algorithm for training neural network-based solvers for partial differential equations (PDEs), such as Physics-Informed Neural Networks (PINNs). Howeve…

arXiv cs.LG TIER_1 English(EN) · Sara Gjorgjieva, Eva Tuba, Tome Eftimov · 2026-05-28 04:00

学习评估随机优化中运行次数估计的可靠性

arXiv:2605.28309v1 Announce Type: new Abstract: In large-scale benchmarking of stochastic optimization algorithms, the key challenge is no longer whether repeated runs are needed for reliability, but how to determine when sufficient evidence has been collected without incurring u…

arXiv cs.LG TIER_1 English(EN) · Jonas Hanselle, Valentin Margraf, Clemens Damke, Eyke H\"ullermeier · 2026-05-28 04:00

鲁棒监督学习的统一与优化

arXiv:2605.28165v1 Announce Type: new Abstract: The literature has proposed various robust alternatives to empirical risk minimisation to address failure modes such as distribution shift, label noise and finite-sample degeneracies. Examples include distributionally robust optimiz…

arXiv cs.LG TIER_1 English(EN) · Zitao Song, Cedar Site Bai, Zhe Zhang, Brian Bullins, David F. Gleich · 2026-05-28 04:00

逐项剪枝能否赋予随机梯度谱控制能力？

arXiv:2605.27733v1 Announce Type: new Abstract: Training instabilities such as loss spikes are frequently the result of stochastic gradient noise. Because of rare expressions in language training data, and multiple layer composition, the noise impact is heavy-tailed and survives …

arXiv cs.LG TIER_1 English(EN) · Mohammed Adnan, Rohan Jain, Tom Jacobs, Ekansh Sharma, Rahul G. Krishnan, Rebekka Burkholz, Yani Ioannou · 2026-05-28 04:00

SparseOpt：解决归一化引起的稀疏训练中的梯度偏差问题

arXiv:2605.27541v1 Announce Type: new Abstract: Dynamic Sparse Training (DST) methods train neural networks by maintaining sparsity while dynamically adapting the network topology. Despite the promise of reduced computation, DST methods converge significantly slower than dense tr…

arXiv cs.AI TIER_1 English(EN) · Tinghan Ye, Arnaud Deza, Ved Mohan, El Mehdi Er Raqabi, Pascal Van Hentenryck · 2026-05-28 04:00

LLM引导模型补丁实现大规模重优化民主化

arXiv:2605.18692v2 Announce Type: replace Abstract: Optimization models developed by operations research (OR) experts are often deployed as decision-support systems in industrial settings. However, real-world environments are dynamic, with evolving business rules and unforeseen p…

arXiv cs.AI TIER_1 English(EN) · Yunwen Lei, Zimeng Wang, Xiaoming Yuan · 2026-05-28 04:00

带动量的随机梯度下降在算法上是稳定的

arXiv:2605.28517v1 Announce Type: cross Abstract: Stochastic gradient descent with momentum (SGDM) is one of the most widely used optimization algorithms in machine learning. While optimization properties of SGDM have been extensively studied in the literature, it remains insuffi…

arXiv cs.AI TIER_1 English(EN) · Teodor-Mihai Stupariu, Andrei Manolache · 2026-05-28 04:00

优化器如何塑造等变神经网络中的学习解决方案

arXiv:2605.27662v1 Announce Type: cross Abstract: Equivariant neural networks encode geometric symmetries by construction, yet they are often difficult to optimize and can underperform less constrained architectures. A growing body of work addresses this through architectural mod…

arXiv cs.AI TIER_1 English(EN) · Sai-Aakash Ramesh, Archit Sood, Andrew Corbett, Tim Dodwell · 2026-05-28 04:00

最优输运与依赖最大化下的监督分布约简

arXiv:2605.27619v1 Announce Type: cross Abstract: Learning representations that capture both intrinsic data geometry and target-relevant structure remains a fundamental challenge, particularly in settings where data reduction must balance compression with predictive fidelity. Whi…

arXiv cs.AI TIER_1 English(EN) · Max Lamparth, Daniel Fein, Andreas Haupt, Marcel Hussing, Mykel J. Kochenderfer · 2026-05-28 04:00

奖励偏差替换：单轴偏差缓解措施重定向优化压力

arXiv:2605.27996v1 Announce Type: new Abstract: Single-axis mitigations of reward-model biases (e.g., reducing proxy reliance on length, sycophancy, or style) can rotate optimization pressure onto correlated proxies rather than eliminate it, a failure mode we call reward bias sub…

arXiv cs.AI TIER_1 English(EN) · Xiaoming Yuan · 2026-05-27 14:17

带动量的随机梯度下降在算法上是稳定的

Stochastic gradient descent with momentum (SGDM) is one of the most widely used optimization algorithms in machine learning. While optimization properties of SGDM have been extensively studied in the literature, it remains insufficiently understood whether and when SGDM can gener…

arXiv cs.NE (Neural & Evolutionary) TIER_1 English(EN) · Tome Eftimov · 2026-05-27 11:08

学习评估随机优化中运行次数估计的可靠性

In large-scale benchmarking of stochastic optimization algorithms, the key challenge is no longer whether repeated runs are needed for reliability, but how to determine when sufficient evidence has been collected without incurring unnecessary computational cost. We study a learni…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-27 04:37

反向传播是否最优？合成梯度何时能提高样本效率

Backpropagation is the default learning rule for artificial neural networks and is often treated as the settled approach whenever differentiability is available. In this work, we revisit this convention through a theoretical lens of sample efficiency. We introduce a unified vecto…

arXiv cs.LG TIER_1 English(EN) · Yixuan Yang, Yuqing He, Song Li · 2026-05-27 04:00

非光滑优化的谱下降收敛性

arXiv:2605.26977v1 Announce Type: new Abstract: The Muon optimizer has recently demonstrated remarkable empirical success in training large language models. However, the theoretical understanding of its mechanisms remains limited. Current convergence guarantees for Muon rely heav…

arXiv cs.LG TIER_1 English(EN) · Dmitry Kovalev · 2026-05-27 04:00

具有无界梯度的随机非光滑凸优化

arXiv:2605.15522v2 Announce Type: replace-cross Abstract: Much of the existing theory on first-order non-smooth optimization is built on a restrictive assumption that the gradients of the objective function are uniformly bounded. We introduce a much more realistic class of genera…

arXiv cs.LG TIER_1 English(EN) · Fabian Schaipp, Robert M. Gower, Adrien Taylor · 2026-05-27 04:00

随机优化中的步长稳定性：理论视角

arXiv:2602.09842v2 Announce Type: replace-cross Abstract: We present a theoretical analysis of stochastic optimization methods in terms of their sensitivity with respect to the step size. We identify a key quantity that, for each method, describes how the performance degrades as …

arXiv cs.LG TIER_1 English(EN) · Kartik Gupta, Stephen D. Miller, Pradeep Ravikumar, Ramarathnam Venkatesan · 2026-05-27 04:00

通过Grassmannian上的随机游走对连续函数进行随机全局优化

arXiv:2605.14151v1 Announce Type: cross Abstract: We introduce a stochastic global optimization method based on random walks on Grassmannian manifolds. To minimize a continuous objective $\ell:\mathbb{R}^d\rightarrow\mathbb{R}$, the method repeatedly samples random $k$-dimensiona…

arXiv cs.LG TIER_1 English(EN) · Kukyoung Jang, Taehyun Cho, Junrui Zhang, Ping Xu, Kyungjae Lee · 2026-05-27 04:00

Ratio-Monotone Transforms for Global Optimization 的概率平滑

arXiv:2605.27316v1 Announce Type: new Abstract: Probabilistic smoothing is a standard tool for global optimization, but existing methods rely on Gaussian kernels and specific transforms, often resulting in strong hyperparameter sensitivity and limited robustness. We propose a gen…

arXiv cs.LG TIER_1 English(EN) · Kyungjae Lee · 2026-05-26 17:25

Ratio-Monotone Transforms for Global Optimization的概率平滑

Probabilistic smoothing is a standard tool for global optimization, but existing methods rely on Gaussian kernels and specific transforms, often resulting in strong hyperparameter sensitivity and limited robustness. We propose a general smoothing framework that combines flexible …

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-26 17:25

Ratio-Monotone Transforms for Global Optimization的概率平滑

Probabilistic smoothing is a standard tool for global optimization, but existing methods rely on Gaussian kernels and specific transforms, often resulting in strong hyperparameter sensitivity and limited robustness. We propose a general smoothing framework that combines flexible …

arXiv cs.LG TIER_1 English(EN) · Song Li · 2026-05-26 13:02

非光滑优化的谱下降收敛性

The Muon optimizer has recently demonstrated remarkable empirical success in training large language models. However, the theoretical understanding of its mechanisms remains limited. Current convergence guarantees for Muon rely heavily on smoothness assumptions, leaving its non-s…

arXiv cs.LG TIER_1 English(EN) · Ziyue Chen, David \v{S}i\v{s}ka, Lukasz Szpruch · 2026-05-26 04:00

全局线性收敛熵正则化softmax策略梯度超越表格MDP

arXiv:2605.24939v1 Announce Type: new Abstract: We study the global convergence of policy gradient for infinite-horizon entropy-regularized Markov decision processes (MDPs) with continuous state and action spaces. We consider log-linear softmax policies with linear function appro…

arXiv cs.LG TIER_1 English(EN) · Matan Schliserman, Shira Vansover-Hager, Tomer Koren · 2026-05-26 04:00

Flat Minima and Generalization: Insights from Stochastic Convex Optimization

arXiv:2511.03548v2 Announce Type: replace Abstract: Understanding the generalization behavior of learning algorithms is a central goal of learning theory. A recently emerging explanation is that learning algorithms are successful in practice because they converge to flat minima, …

arXiv cs.LG TIER_1 English(EN) · Enea Monzio Compagnoni, Rustem Islamov, Frank Norbert Proske, Aurelien Lucchi, Antonio Orvieto, Eduard Gorbunov · 2026-05-26 04:00

关于批噪声、自适应性和压缩的交互作用，在$(L_0,L_1)$-光滑性下：一种SDE方法

arXiv:2506.00181v2 Announce Type: replace Abstract: Distributed stochastic optimization intertwines (i) stochastic gradient noise, (ii) communication compression, and (iii) adaptive/normalized updates. While each factor has been studied in isolation, their joint effect under real…

arXiv cs.LG TIER_1 English(EN) · Jose Blanchet, Peter Glynn, Wenhao Yang · 2026-05-26 04:00

随机梯度下降的有限方差之外的统计推断

arXiv:2605.26000v1 Announce Type: cross Abstract: Stochastic gradient descent (SGD) is a foundational algorithm for large-scale statistical learning and stochastic optimization. However, statistical inference based on SGD iterates remains challenging when stochastic gradients hav…

arXiv cs.LG TIER_1 English(EN) · Khen Cohen, Mark Glass, Meir Feder, Yaron Oz · 2026-05-26 04:00

组合优化中的复杂相位动力学隐式二值化

arXiv:2605.24502v1 Announce Type: cross Abstract: We introduce a physics-inspired continuous relaxation framework that yields substantially improved solutions for NP-hard combinatorial optimization problems, including Quadratic Unconstrained Binary Optimization (QUBO), binary spa…

arXiv cs.LG TIER_1 English(EN) · Chung-Yiu Yau, Dawei Li, Athanasios Glentis, Valentyn Boreiko, Hoi-To Wai, Mingyi Hong · 2026-05-26 04:00

EMA-Nesterov：加速深度学习优化的Nesterov前瞻性算法的稳定化

arXiv:2605.25395v1 Announce Type: new Abstract: Lookahead-based acceleration methods, such as Nesterov's momentum, are widely used in optimization, but they often become unreliable in deep learning training mainly due to stochastic gradient noise and non-convex loss landscapes. I…

arXiv cs.LG TIER_1 English(EN) · Yudong W. Xu, Wenhao Li, Xiaoyu Wang, Scott Sanner, Elias B. Khalil · 2026-05-26 04:00

被阻挡的Gibbs遇见Diffusion Transformers：无监督学习用于约束优化

arXiv:2605.25129v1 Announce Type: new Abstract: Diffusion models have shown promise in learning to solve constraint optimization problems. However, they are mostly restricted to problems with binary variables and rely on graph neural networks, hindering their application to a bro…

arXiv cs.LG TIER_1 English(EN) · Zhuanghua Liu, Luo Luo · 2026-05-26 04:00

具有重尾噪声的零阶非凸非光滑优化

arXiv:2605.24513v1 Announce Type: new Abstract: This paper considers the nonconvex nonsmooth problem in which the objective function is Lipschitz continuous. We focus on the stochastic setting where the algorithm can access stochastic function value evaluations with heavy-tailed …

arXiv cs.AI TIER_1 English(EN) · Chen Liang, Xiatao Sun, Qian Wang, Daniel Rakita · 2026-05-26 04:00

将陈旧梯度转化为稳定梯度：具有隐式景观平滑的相干坐标下降用于轻量级零阶优化

arXiv:2605.14373v2 Announce Type: replace-cross Abstract: Zeroth-Order (ZO) optimization is pivotal for scenarios where backpropagation is unavailable, such as memory-constrained on-device learning and black-box optimization. However, existing methods face a stark trade-off: they…

arXiv cs.AI TIER_1 English(EN) · Haoyu Huang, Boyu Liu, Linlin Yang, Yanjing Li, Yuguang Yang, Xuhui Liu, Canyu Chen, Zhongqian Fu, Baochang Zhang · 2026-05-26 04:00

SURGE：二值神经网络中的代理梯度自适应

arXiv:2605.10989v3 Announce Type: replace-cross Abstract: The training of Binary Neural Networks (BNNs) is fundamentally based on gradient approximation for non-differentiable binarization operations (e.g., sign function). However, prevailing methods including the Straight-Throug…

arXiv cs.AI TIER_1 English(EN) · Chinmay Maheshwari, Chinmay Pimpalkhare, Debasish Chatterjee · 2026-05-26 04:00

EXOTIC：一种精确、乐观的基于树的极小极大优化算法

arXiv:2508.12479v2 Announce Type: replace-cross Abstract: Min-max optimization arises in many domains such as game theory, adversarial machine learning, etc. For these problems, gradient-based methods are well understood and enjoy strong guarantees. However, in the absence of con…

arXiv cs.AI TIER_1 English(EN) · Huangyu Xu, Jingqin Yang, Qianqian Xu, Jiaye Teng · 2026-05-26 04:00

重参数化、权重衰减和自适应学习率下的稀疏优化理论分析

arXiv:2605.25134v1 Announce Type: cross Abstract: Sparse optimization is a fundamental challenge in various practical applications. A popular approach to sparse optimization is $\ell_p$ regularization. However, it may encounter optimization instability due to the unbounded gradie…

arXiv cs.LG TIER_1 English(EN) · Yequan Zhao, Ruijie Zhang, Liyan Tan, Niall Moran, Tong Qin, Zheng Zhang · 2026-05-25 04:00

FuRA：具有谱预处理的全秩参数高效微调

arXiv:2605.22869v1 Announce Type: new Abstract: Both full fine-tuning (Full FT) and parameter-efficient fine-tuning methods such as LoRA introduce weight updates without accounting for the spectral structure established during pretraining. As a result, noisy gradients from limite…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-25 03:39

EMA-Nesterov：加速深度学习优化的Nesterov前瞻性算法的稳定化

Lookahead-based acceleration methods, such as Nesterov's momentum, are widely used in optimization, but they often become unreliable in deep learning training mainly due to stochastic gradient noise and non-convex loss landscapes. In particular, standard lookahead relies on short…

arXiv cs.LG TIER_1 English(EN) · Alexander Tyurin · 2026-05-22 04:00

广义和$(L_0, L_1)$-光滑性下加速梯度方法的近最优收敛性

arXiv:2508.06884v2 Announce Type: replace-cross Abstract: We study first-order methods for convex optimization problems with functions $f$ satisfying the recently proposed $\ell$-smoothness condition $||\nabla^{2}f(x)|| \le \ell\left(||\nabla f(x)||\right),$ which generalizes the…

arXiv cs.LG TIER_1 English(EN) · Zhuo Chen (equal contribution), Xinzhe Yuan (equal contribution), Jianshu Zhang (Shanghai Artificial Intelligence Laboratory, Shanghai, China, School of Computer Science, Shanghai Jiao Tong University, Shanghai, China), Jinzong Dong (Shanghai Artificial … · 2026-05-22 04:00

LABO：通过广泛探索和选择性实验加速LLM的贝叶斯优化

arXiv:2605.22054v1 Announce Type: new Abstract: The high cost and data scarcity in scientific exploration have motivated the use of large language models (LLMs) as knowledge-driven components in Bayesian optimization (BO). However, existing approaches typically embed LLMs directl…

arXiv cs.LG TIER_1 English(EN) · Ryan Cory-Wright, Jean Pauphilet · 2026-05-22 04:00

Compact Lifted Relaxations for Low-Rank Optimization

arXiv:2603.20228v2 Announce Type: replace-cross Abstract: We develop tractable convex relaxations for rank-constrained quadratic optimization problems over $n \times m$ matrices, a setting for which tractable relaxations are typically only available when the objective or constrai…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-19 22:11

Ada2MS：一种基于逐元素和全局二阶矩估计指数混合的混合优化算法

Optimization algorithms are core methods by which machine learning models iteratively minimize loss functions, update parameters, learn from data, and improve performance. Momentum SGD and AdamW represent two important optimization paradigms. AdamW produces stable updates and usu…

arXiv cs.LG TIER_1 English(EN) · Jalal Etesami · 2026-05-19 11:00

非凸双层优化基于共识的粒子方法的收敛性

In this paper, we study a consensus-based optimization method for nonconvex bi-level optimization, where the objective is to minimize an upper-level function over the set of global minimizers of a lower-level problem. The proposed approach is derivative-free, and constructs its c…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-19 11:00

非凸双层优化中基于共识的粒子方法的收敛性

In this paper, we study a consensus-based optimization method for nonconvex bi-level optimization, where the objective is to minimize an upper-level function over the set of global minimizers of a lower-level problem. The proposed approach is derivative-free, and constructs its c…

arXiv cs.NE (Neural & Evolutionary) TIER_1 English(EN) · Shinichi Shirakawa · 2026-05-18 06:31

二值空间安全优化自适应随机自然梯度法

Optimization problems in real-world applications across the medical and engineering domains often involve potential risks when evaluating candidate solutions. Safe optimization aims to perform optimization while suppressing unsafe solution evaluations in such situations. For cont…

arXiv cs.LG TIER_1 English(EN) · Frank Liu · 2026-05-15 14:50

加速梯度下降以最小开销实现更快收敛

In this paper, we present CT-AGD (Curvature-Tuned Accelerated Gradient Descent), an optimization method for non-convex optimization problems in deep learning training tasks. CT-AGD is a general boosting procedure that accelerates first-order methods by explicitly capturing the lo…

arXiv stat.ML TIER_1 English(EN) · Kate\v{r}ina Henclov\'a, V\'aclav \v{S}m\'idl · 2026-06-12 04:00

GEMSS：一种用于发现分类和回归问题中多个稀疏解的变分贝叶斯方法

arXiv:2602.08913v2 Announce Type: replace-cross Abstract: High-dimensional, underdetermined and highly correlated systems are common in data science practice, especially when analyzing physical measurements. In such settings, feature selection poses a fundamental challenge becaus…

arXiv stat.ML TIER_1 English(EN) · James Cuin, Davide Carbone, Yanbo Tang, O. Deniz Akyildiz · 2026-06-12 04:00

通过序列蒙特卡洛实现高效随机优化

arXiv:2601.22003v2 Announce Type: replace Abstract: The problem of optimising functions with intractable gradients frequently arises in machine learning and statistics, ranging from maximum marginal likelihood estimation procedures to fine-tuning of generative models. Stochastic …

arXiv stat.ML TIER_1 English(EN) · Dimitra Maoutsa · 2026-06-12 04:00

从几何到动力学：从稀疏观测中学习具有几何约束的过阻尼 Langevin 动力学

arXiv:2512.23566v2 Announce Type: replace-cross Abstract: How can we learn the laws underlying the dynamics of stochastic systems when their trajectories are sampled sparsely in time? Existing methods either require temporally resolved high-frequency observations, or rely on geom…

arXiv stat.ML TIER_1 English(EN) · Noah Golowich, Ankur Moitra, Dhruv Rohatgi · 2026-06-11 04:00

测试时训练在近似采样中的威力

arXiv:2606.11437v1 Announce Type: cross Abstract: Efficiently sampling from a complex probability distribution is a fundamental problem which has become increasingly pertinent in recent years with the rise of generative AI, as sophisticated sampling procedures from LLMs have been…

arXiv stat.ML TIER_1 English(EN) · Susmit Sarkar, Abhinav Raghuvanshi, Kushal Chakrabarti, Mayank Baranwal · 2026-06-11 04:00

分布式优化下宽松全局几何的量化随机对偶方法

arXiv:2606.11339v1 Announce Type: cross Abstract: We study distributed optimization with stochastic gradients and finite-bit communication modeled by random (unbiased) quantization. We propose q-PDGD, a quantized stochastic primal-dual method, and analyze it under relaxed global …

arXiv stat.ML TIER_1 English(EN) · Junzhuo Gao, Ling Peng, Xu Guo, Heng Lian · 2026-06-11 04:00

无批次号约束的可再生套索：一种梯度增强方法

arXiv:2606.11738v1 Announce Type: new Abstract: We study online estimation for high-dimensional generalized linear models with streaming data. First, for the non-distributed setting, we propose a gradient-enhanced surrogate loss that approximates the cumulative loss using only hi…

arXiv stat.ML TIER_1 English(EN) · Heng Lian · 2026-06-10 07:15

无批次号约束的可再生套索：一种梯度增强方法

We study online estimation for high-dimensional generalized linear models with streaming data. First, for the non-distributed setting, we propose a gradient-enhanced surrogate loss that approximates the cumulative loss using only historical summaries, which modifies and improves …

arXiv stat.ML TIER_1 English(EN) · Yiwei Zhou, Ziheng Chen · 2026-06-10 04:00

Localized Tamed Stochastic-Gradient Langevin Dynamics 的确定性分母设计

arXiv:2606.10559v1 Announce Type: cross Abstract: Tamed stochastic-gradient Langevin dynamics (SGLD) stabilizes large drifts by adding a denominator to the update. If this denominator uses the same stochastic-gradient sample as the update step, it can also change the conditional …

arXiv stat.ML TIER_1 English(EN) · Morris Trestman, Stefan Gugler, Felix A. Faber, O. A. von Lilienfeld · 2026-06-10 04:00

用于鲁棒训练集选择的梯度引导最远点采样

arXiv:2510.08906v2 Announce Type: replace Abstract: Training set sampling methods are used to improve model performance and lower data costs in machine learning problems relevant to chemistry. We introduce Gradient Guided Furthest Point Sampling (GGFPS), a simple extension of Fur…

arXiv stat.ML TIER_1 English(EN) · Sobihan Surendran (LPSM), Adeline Fermanian (LPSM), Sylvain Le Corff (LPSM) · 2026-06-10 04:00

Latent Guided Sampling for Combinatorial Optimization

arXiv:2506.03672v2 Announce Type: replace Abstract: Combinatorial Optimization problems are widespread in domains such as logistics, manufacturing, and drug discovery, yet their NP-hard nature makes them computationally challenging. Recent Neural Combinatorial Optimization (NCO) …

arXiv stat.ML TIER_1 English(EN) · Sasan Vakili, Dani\"el Woonings, Pradyumna Paruchuri, Peyman Mohajerin Esfahani · 2026-06-10 04:00

非线性估计器：用于参数学习的双贝叶斯仿射估计器

arXiv:2606.10111v1 Announce Type: cross Abstract: This paper presents a nonlinear parameter estimator for Wiener-type state-space models obtained as a fixed-point architecture that couples two affine minimum mean-squared error (MMSE) estimators: one for the unknown parameters and…

arXiv stat.ML TIER_1 English(EN) · Gil Goldshlager, Jiang Hu, Lin Lin · 2026-06-10 04:00

子采样自然梯度算法的草图与投影分析

arXiv:2508.21022v3 Announce Type: replace-cross Abstract: Subsampled natural gradient descent (SNG) has been used to enable high-precision scientific machine learning, but standard analyses based on stochastic preconditioning fail to provide insight into realistic small-sample se…

arXiv stat.ML TIER_1 English(EN) · Yahong Yang, Juncai He · 2026-06-10 04:00

更深还是更宽：基于 Sobolev 损失的最优泛化误差视角

arXiv:2402.00152v5 Announce Type: replace-cross Abstract: Constructing the architecture of a neural network is a challenging pursuit for the machine learning community, and the dilemma of whether to go deeper or wider remains a persistent question. This paper explores a compariso…

arXiv stat.ML TIER_1 English(EN) · Marc Becker, Lennart Schneider, Martin Binder, Lars Kotthoff, Bernd Bischl · 2026-06-10 04:00

mlr3mbo: R中的贝叶斯优化

arXiv:2603.29730v2 Announce Type: replace Abstract: We present mlr3mbo, a modular toolbox for Bayesian optimization in R. mlr3mbo supports single- and multi-objective optimization, multi-point proposals, batch and asynchronous parallelization, and robust error handling. While it …

arXiv stat.ML TIER_1 English(EN) · Dhruv Rohatgi · 2026-06-09 20:48

测试时训练在近似采样中的威力

Efficiently sampling from a complex probability distribution is a fundamental problem which has become increasingly pertinent in recent years with the rise of generative AI, as sophisticated sampling procedures from LLMs have been proposed to solve challenging reasoning problems.…

arXiv stat.ML TIER_1 English(EN) · Mayank Baranwal · 2026-06-09 18:18

分布式优化下宽松全局几何的量化随机对偶方法

We study distributed optimization with stochastic gradients and finite-bit communication modeled by random (unbiased) quantization. We propose q-PDGD, a quantized stochastic primal-dual method, and analyze it under relaxed global geometry. Under restricted secant inequality (RSI)…

arXiv stat.ML TIER_1 English(EN) · Ziheng Chen · 2026-06-09 08:25

Localized Tamed Stochastic-Gradient Langevin Dynamics 的确定性分母设计

Tamed stochastic-gradient Langevin dynamics (SGLD) stabilizes large drifts by adding a denominator to the update. If this denominator uses the same stochastic-gradient sample as the update step, it can also change the conditional mean drift. We study deterministic denominators: t…

arXiv stat.ML TIER_1 English(EN) · Giorgio Giannone, Mustafa Eyceoz, Shabana Baig, Shivchander Sudalairaj, Anna C. Doris, Faez Ahmed, Akash Srivastava, Kai Xu · 2026-06-09 04:00

超越领域可验证性的推理时尺度内在选择与粒子重采样

arXiv:2606.08850v1 Announce Type: cross Abstract: Inference-Time Scaling (ITS) has largely succeeded in verifiable domains like math and coding, where cheap verification enables scalable output selection. However, extending ITS to tasks prone to systematic failure - driven by fau…

arXiv stat.ML TIER_1 English(EN) · Filip Kova\v{c}evi\'c, Hong Chang Ji, Denny Wu, Mahdi Soltanolkotabi, Marco Mondelli · 2026-06-09 04:00

全批量梯度下降优于单次SGD：单索引学习中的样本复杂度分离

arXiv:2602.02431v2 Announce Type: replace Abstract: It is folklore that reusing training data more than once can improve the statistical efficiency of gradient-based learning. While this phenomenon has been extensively studied in linear regression, the benefit of multi-pass gradi…

arXiv stat.ML TIER_1 English(EN) · Tuan A. Vu, Harri L\"ahdesm\"aki, Julien Martinelli · 2026-06-09 04:00

用于隐空间贝叶斯优化的上下文学习

arXiv:2606.09664v1 Announce Type: cross Abstract: Bayesian optimization (BO) is a central tool for sample-efficient design, and latent-space Bayesian optimization (LSBO) extends it to structured objects such as molecules and proteins. In parallel, tabular foundation models such a…

arXiv stat.ML TIER_1 English(EN) · Federico Bassetti, Vassili De Palma, Lucia Ladelli · 2026-06-09 04:00

卷积贝叶斯神经网络的大偏差原理

arXiv:2603.06023v2 Announce Type: replace-cross Abstract: While suitably scaled CNNs with Gaussian initialization are known to converge to Gaussian processes as the number of channels diverges, little is known beyond this Gaussian limit. We establish a large deviation principle (…

arXiv stat.ML TIER_1 English(EN) · Trevor Campbell, Jonathan H. Huggins, Kyurae Kim, Charles C. Margossian · 2026-06-09 04:00

用于变分推断的默认优化器的规模化经验调优与比较

arXiv:2606.07841v1 Announce Type: cross Abstract: Black-box variational inference (BBVI) is a methodology for posterior approximation that relies on stochastic optimization. In practice, the stochastic optimizers underpinning BBVI generally require extensive problem-specific tuni…

arXiv stat.ML TIER_1 English(EN) · Wei-Cheng Lee, Francesco Orabona · 2026-06-09 04:00

具有线性函数逼近的非投影TD学习的稳健$\widetilde{\mathcal{O}}(1/\sqrt{T})$收敛率

arXiv:2506.01052v3 Announce Type: replace-cross Abstract: We investigate the finite-time convergence properties of Temporal Difference (TD) learning with linear function approximation, a cornerstone of reinforcement learning. We are interested in the so-called ``robust'' setting,…

arXiv stat.ML TIER_1 English(EN) · Jaehoan Kim, Anirban Bhattacharya, Debdeep Pati · 2026-06-09 04:00

有限秩高斯过程的自适应分辨率

arXiv:2505.24066v2 Announce Type: replace-cross Abstract: Finite-rank approximations are widely used to scale Gaussian process (GP) regression, but their posterior behavior can differ from that of the corresponding parent GP prior. We study a class of finite-rank GP priors built …

arXiv stat.ML TIER_1 English(EN) · Peyman Mohajerin Esfahani · 2026-06-08 19:41

非线性估计器：用于参数学习的双贝叶斯仿射估计器

This paper presents a nonlinear parameter estimator for Wiener-type state-space models obtained as a fixed-point architecture that couples two affine minimum mean-squared error (MMSE) estimators: one for the unknown parameters and one for latent variables. The architecture retain…

arXiv stat.ML TIER_1 English(EN) · Julien Martinelli · 2026-06-08 15:45

用于隐空间贝叶斯优化的上下文学习

Bayesian optimization (BO) is a central tool for sample-efficient design, and latent-space Bayesian optimization (LSBO) extends it to structured objects such as molecules and proteins. In parallel, tabular foundation models such as TabPFN and TabICL now achieve state-of-the-art r…

arXiv stat.ML TIER_1 English(EN) · Qianqian Lei, Soham Bonnerjee, Yuefeng Han, Wei Biao Wu · 2026-06-08 04:00

有限 $L_p$ 矩下的尖锐泛化界限：超越有界差分的稳定性

arXiv:2606.06855v1 Announce Type: new Abstract: While algorithmic stability is a central tool for understanding generalization of learning algorithms, existing high-probability guarantees typically rely on uniform boundedness or sub-Gaussian/sub-Weibull tail assumptions, which ca…

arXiv stat.ML TIER_1 English(EN) · Kai Xu · 2026-06-07 21:43

超越领域可验证性的推理时尺度内在选择与粒子重采样

Inference-Time Scaling (ITS) has largely succeeded in verifiable domains like math and coding, where cheap verification enables scalable output selection. However, extending ITS to tasks prone to systematic failure - driven by faulty initial assumptions or unmet multidimensional …

arXiv stat.ML TIER_1 English(EN) · Charles C. Margossian · 2026-06-05 21:04

大规模经验性调优和比较变分推断的默认优化器

Black-box variational inference (BBVI) is a methodology for posterior approximation that relies on stochastic optimization. In practice, the stochastic optimizers underpinning BBVI generally require extensive problem-specific tuning, which undermines its promise as a truly "black…

arXiv stat.ML TIER_1 English(EN) · Daniel Haimovich, Fridolin Linder, Lorenzo Perini, Niek Tax, Milan Vojnovic · 2026-06-05 04:00

多校准梯度提升的收敛性

arXiv:2602.06773v2 Announce Type: replace-cross Abstract: Multicalibration gradient boosting has recently emerged as a scalable method that empirically produces approximately multicalibrated predictors and has been deployed at web scale. Despite this empirical success, its conver…

arXiv stat.ML TIER_1 English(EN) · Ziad Kobeissi (L2S), \'Elo\"ise Berthier (U2IS) · 2026-06-05 04:00

TD(0) 结合线性函数逼近、通用学习步长和独立同分布样本的快速且鲁棒的收敛率

arXiv:2606.05967v1 Announce Type: new Abstract: In this paper, we study the finite-time behavior of the TD(0) temporal-difference method with linear function approximation (LFA). We consider on-policy independent and identically distributed (i.i.d.) samples, a constant learning s…

arXiv stat.ML TIER_1 English(EN) · David Janz, Shuai Liu, Alex Ayoub, Csaba Szepesv\'ari · 2026-06-05 04:00

通过线性扰动损失最小化进行探索

arXiv:2311.07565v3 Announce Type: replace-cross Abstract: We introduce exploration via linear loss perturbations (EVILL), a randomised exploration method for structured stochastic bandit problems that works by solving for the minimiser of a linearly perturbed regularised negative…

arXiv stat.ML TIER_1 English(EN) · Yiwei Zhou, Ziheng Chen · 2026-06-05 04:00

驯服SGLD的确定性信封：解耦随机梯度噪声与局部驯服

arXiv:2606.05242v1 Announce Type: new Abstract: Stochastic-gradient Langevin algorithms often use tamed denominators to stabilize non-globally Lipschitz drifts. This paper shows that when the denominator depends on the same stochastic-gradient realization as the numerator, the ta…

arXiv stat.ML TIER_1 English(EN) · Ziqian Wang, Chenxi Fang, Zhen Zhang · 2026-06-05 04:00

DiffSlack：通过可学习松弛变量在非线性不等式约束下进行学习

arXiv:2606.05247v1 Announce Type: cross Abstract: Enforcing nonlinear inequality constraints in neural networks remains challenging, especially when the output is subject to many coupled constraints. Existing hard constraint methods often impose structural restrictions on the con…

arXiv stat.ML TIER_1 English(EN) · Wei Biao Wu · 2026-06-05 02:59

有限Lp矩下的泛化界限：超越有界差分的稳定性

While algorithmic stability is a central tool for understanding generalization of learning algorithms, existing high-probability guarantees typically rely on uniform boundedness or sub-Gaussian/sub-Weibull tail assumptions, which can be overly restrictive for modern settings with…

arXiv stat.ML TIER_1 English(EN) · Éloïse Berthier · 2026-06-04 10:10

TD(0) 结合线性函数逼近、通用学习步长和独立同分布样本的快速且鲁棒的收敛率

In this paper, we study the finite-time behavior of the TD(0) temporal-difference method with linear function approximation (LFA). We consider on-policy independent and identically distributed (i.i.d.) samples, a constant learning step, and the Polyak-Juditsky averaging method. W…

arXiv stat.ML TIER_1 English(EN) · Éloïse Berthier · 2026-06-04 10:10

TD(0) 结合线性函数逼近、通用学习步长和独立同分布样本的快速且鲁棒的收敛速率

In this paper, we study the finite-time behavior of the TD(0) temporal-difference method with linear function approximation (LFA). We consider on-policy independent and identically distributed (i.i.d.) samples, a constant learning step, and the Polyak-Juditsky averaging method. W…

arXiv stat.ML TIER_1 English(EN) · Paul D\"utting, Federico Fusco, Silvio Lattanzi, Ashkan Norouzi-Fard, Ola Svensson, Morteza Zadimoghaddam · 2026-06-04 04:00

动态一致子模最大化的通用框架

arXiv:2606.04946v1 Announce Type: cross Abstract: Consistency is an important property in dynamic submodular maximization and entails maintaining a near-optimal solution at all times, making only a small number of adjustments to the solution in each step. Prior work has explored …

arXiv stat.ML TIER_1 English(EN) · Chon Wai Ho, Sumeetpal S. Singh, Jiaqi Guo · 2026-06-04 04:00

随机最短路径问题的贝叶斯学习

arXiv:2606.04845v1 Announce Type: new Abstract: Sequential decision-making problems are often modelled as a Markov decision process (MDP). We focus on the stochastic shortest path (SSP) problem, which is an infinite-horizon undiscounted MDP with absorbing terminal states. We deve…

arXiv stat.ML TIER_1 English(EN) · Morteza Zadimoghaddam · 2026-06-03 14:35

动态一致子模组最大化的通用框架

Consistency is an important property in dynamic submodular maximization and entails maintaining a near-optimal solution at all times, making only a small number of adjustments to the solution in each step. Prior work has explored this question for the insertion-only case, where t…

arXiv stat.ML TIER_1 English(EN) · Jiaqi Guo · 2026-06-03 13:13

随机最短路径问题的贝叶斯学习

Sequential decision-making problems are often modelled as a Markov decision process (MDP). We focus on the stochastic shortest path (SSP) problem, which is an infinite-horizon undiscounted MDP with absorbing terminal states. We develop a Bayesian framework to learn the optimal de…

arXiv stat.ML TIER_1 English(EN) · Zhen Zhang · 2026-06-03 11:58

DiffSlack：通过可学习松弛变量在非线性不等式约束下进行学习

Enforcing nonlinear inequality constraints in neural networks remains challenging, especially when the output is subject to many coupled constraints. Existing hard constraint methods often impose structural restrictions on the constraint set or introduce substantial computational…

arXiv stat.ML TIER_1 English(EN) · Ziheng Chen · 2026-06-03 07:23

驯服SGLD的确定性信封：解耦随机梯度噪声与局部驯服

Stochastic-gradient Langevin algorithms often use tamed denominators to stabilize non-globally Lipschitz drifts. This paper shows that when the denominator depends on the same stochastic-gradient realization as the numerator, the taming step changes the stochastic oracle itself a…

arXiv stat.ML TIER_1 English(EN) · Yan-Feng Xie, Shuche Wang, Peng Zhao, Zhi-Hua Zhou · 2026-06-03 04:00

Gradient-Variation Interval Regret 在线学习

arXiv:2606.03831v1 Announce Type: cross Abstract: This paper investigates non-stationary online learning using the metric of interval regret, which requires an online algorithm to perform well over every time interval. We propose the first online learning algorithm that achieves …

arXiv stat.ML TIER_1 English(EN) · Hyunseok Seung, Matthias Katzfuss · 2026-06-03 04:00

Scalable Derivative Gaussian Processes via Exact Gradient Reduction

arXiv:2606.02909v1 Announce Type: new Abstract: Gradient observations can substantially improve Gaussian process (GP) surrogates, particularly in high-dimensional settings where function evaluations are expensive. However, exact inference with $n$ function values and $n$ full gra…

arXiv stat.ML TIER_1 English(EN) · Zhi-Hua Zhou · 2026-06-02 16:16

Gradient-Variation Interval Regret在线学习

This paper investigates non-stationary online learning using the metric of interval regret, which requires an online algorithm to perform well over every time interval. We propose the first online learning algorithm that achieves an interval regret bound scaling with gradient var…

arXiv stat.ML TIER_1 English(EN) · Dmitrii M. Ostrovskii · 2026-06-02 04:00

近乎最优且易于处理的移位不变性估计

arXiv:2411.03383v3 Announce Type: replace-cross Abstract: How hard is it to estimate a discrete-time signal $(x_{1}, ..., x_{n}) \in \mathbb{C}^n$ satisfying an unknown linear recurrence relation of order $s$ and observed in i.i.d. complex Gaussian noise? The class of all such si…

arXiv stat.ML TIER_1 English(EN) · Johanna Menn, Miriam Kober, Paul Brunzema, David Stenger, Sebastian Trimpe · 2026-06-02 04:00

局部优先贝叶斯优化

arXiv:2606.02351v1 Announce Type: cross Abstract: Bayesian optimization (BO) is a popular and effective approach for tuning expensive, noisy experiments, but requires the formulation of an explicit objective function. Preferential BO (PBO) removes this requirement by learning fro…

arXiv stat.ML TIER_1 English(EN) · Zijian Liu · 2026-06-02 04:00

重尾噪声下随机梯度方法的期望收敛性

arXiv:2606.00520v1 Announce Type: cross Abstract: Many stochastic gradient methods are believed not to converge when the noise in stochastic gradients has only a finite $p$-th moment for $p\in\left(1,2\right)$, a setting known as the heavy-tailed noise assumption. However, some r…

arXiv stat.ML TIER_1 English(EN) · Dimitris Oikonomou, Nicolas Loizou · 2026-06-02 04:00

非光滑优化的安全随机Polyak步长：无需（次）梯度即可实现鲁棒性能

arXiv:2512.02342v3 Announce Type: replace-cross Abstract: The stochastic Polyak step size (SPS) has proven to be a promising choice for stochastic gradient descent (SGD), delivering competitive performance relative to state-of-the-art methods on smooth convex and non-convex optim…

arXiv stat.ML TIER_1 English(EN) · Yuanzhe Tao, Yifeng Liu, Huizhuo Yuan, Xun Zhou, Yuan Cao, Quanquan Gu · 2026-06-02 04:00

迈向简单且可证明的无参数自适应梯度方法

arXiv:2412.19444v2 Announce Type: replace-cross Abstract: Optimization algorithms such as AdaGrad and Adam have significantly advanced the training of deep models by dynamically adjusting the learning rate during the optimization process. However, ad-hoc tuning of learning rates …

arXiv stat.ML TIER_1 English(EN) · Tongyu Li, Alexander Giessing · 2026-06-02 04:00

梯度流上的统计推断

arXiv:2606.01257v1 Announce Type: cross Abstract: Gradient-based algorithms are central to modern statistical estimation, yet their statistical analysis is often restricted to fixed-time behavior, such as convergence to a population target or fluctuations at a prescribed iteratio…

arXiv stat.ML TIER_1 English(EN) · Luca Muscarnera, Silas Ruhrberg Est\'evez, Yuanzhang Xiao, Mihaela Van der Schaar · 2026-06-02 04:00

通过临界阻尼动量优化实现快速插值后泛化

arXiv:2606.01521v1 Announce Type: cross Abstract: A central problem in machine learning is that models can achieve near-perfect training performance while generalizing substantially less well to unseen examples. This gap is especially acute in high-dimensional, low-sample regimes…

arXiv stat.ML TIER_1 English(EN) · Yuexiao Dong, Kenichiro Mcalinn, Edoardo Airoldi, Lei Li · 2026-06-02 04:00

FlowSDR：通过条件归一化流实现充分降维

arXiv:2606.01346v1 Announce Type: cross Abstract: Sufficient dimension reduction (SDR) seeks a low-dimensional linear projection of predictors that preserves the conditional distribution of the response. Existing methods target this conditional distribution indirectly, via invers…

arXiv stat.ML TIER_1 English(EN) · Thibault Pautrel, Fran\c{c}ois Portier · 2026-06-02 04:00

黎曼随机优化用于充分降维

arXiv:2606.00413v1 Announce Type: new Abstract: Sufficient dimension reduction (SDR) makes high-dimensional regression tractable by projecting the covariates onto a low-dimensional subspace that preserves the conditional mean of the response. Existing gradient-based estimators ei…

arXiv stat.ML TIER_1 English(EN) · Matthias Katzfuss · 2026-06-01 21:29

通过精确梯度约简实现可扩展的导数高斯过程

Gradient observations can substantially improve Gaussian process (GP) surrogates, particularly in high-dimensional settings where function evaluations are expensive. However, exact inference with $n$ function values and $n$ full gradients in $d$ dimensions scales cubically in the…

arXiv stat.ML TIER_1 English(EN) · Sebastian Trimpe · 2026-06-01 15:00

局部优先贝叶斯优化

Bayesian optimization (BO) is a popular and effective approach for tuning expensive, noisy experiments, but requires the formulation of an explicit objective function. Preferential BO (PBO) removes this requirement by learning from pairwise human feedback, yet existing methods st…

arXiv stat.ML TIER_1 English(EN) · Dario Draca, Takuo Matsubara, Minh-Ngoc Tran · 2026-06-01 04:00

黎曼流形上的无反演自然梯度下降

arXiv:2604.02969v2 Announce Type: replace Abstract: The natural gradient method is a central tool for statistical optimisation, but its broader application is hindered by the assumption of a Euclidean parameter space, the repeated estimation of the Fisher information matrix (FIM)…

arXiv stat.ML TIER_1 English(EN) · Michael Ibrahim, Hanqi Zhao, Eli Sennesh, Zhi Li, Anqi Wu, Jacob L. Yates, Chengrui Li, Hadi Vafaii · 2026-06-01 04:00

泊松梯度估计的搭车客指南

arXiv:2602.03896v2 Announce Type: replace Abstract: Poisson-distributed latent variable models are widely used in computational neuroscience, but differentiating through discrete stochastic samples remains challenging. Two approaches address this: *Exponential Arrival Time* (EAT)…

arXiv stat.ML TIER_1 Deutsch(DE) · Facheng Yu, Ronak Mehta, Alex Luedtke, Zaid Harchaoui · 2026-06-01 04:00

随机梯度与干扰项

arXiv:2508.20326v2 Announce Type: replace Abstract: Stochastic gradient optimization is the dominant learning paradigm for a variety of scenarios, from classical supervised learning to modern self-supervised learning. We consider stochastic gradient algorithms for learning proble…

arXiv stat.ML TIER_1 English(EN) · Mihaela Van der Schaar · 2026-06-01 00:54

快速泛化在临界阻尼动量优化插值后

A central problem in machine learning is that models can achieve near-perfect training performance while generalizing substantially less well to unseen examples. This gap is especially acute in high-dimensional, low-sample regimes, where many interpolating solutions exist and opt…

arXiv stat.ML TIER_1 English(EN) · Lei Li · 2026-05-31 16:54

FlowSDR：通过条件归一化流实现充分降维

Sufficient dimension reduction (SDR) seeks a low-dimensional linear projection of predictors that preserves the conditional distribution of the response. Existing methods target this conditional distribution indirectly, via inverse moments, local forward regression, or neural ens…

arXiv stat.ML TIER_1 English(EN) · Alexander Giessing · 2026-05-31 14:22

梯度流上的统计推断

Gradient-based algorithms are central to modern statistical estimation, yet their statistical analysis is often restricted to fixed-time behavior, such as convergence to a population target or fluctuations at a prescribed iteration. In many applications, however, uncertainty quan…

arXiv stat.ML TIER_1 English(EN) · Zijian Liu · 2026-05-30 04:27

重尾噪声下随机梯度方法的期望收敛性

Many stochastic gradient methods are believed not to converge when the noise in stochastic gradients has only a finite $p$-th moment for $p\in\left(1,2\right)$, a setting known as the heavy-tailed noise assumption. However, some recent studies have found that Stochastic Gradient …

arXiv stat.ML TIER_1 English(EN) · François Portier · 2026-05-29 23:06

黎曼随机优化用于充分降维

Sufficient dimension reduction (SDR) makes high-dimensional regression tractable by projecting the covariates onto a low-dimensional subspace that preserves the conditional mean of the response. Existing gradient-based estimators either operate in the ambient space and suffer fro…

arXiv stat.ML TIER_1 English(EN) · Rocco Caprio, Adrien Corenflos, Sam Power · 2026-05-29 04:00

坐标上升变分推断的Wasserstein收缩

arXiv:2605.30253v1 Announce Type: new Abstract: We study the contraction in Wasserstein distance of the coordinate ascent variational inference algorithm. This is shown to hold under a transport-information inequality at the fixed points and a functional smoothness condition. The…

arXiv stat.ML TIER_1 English(EN) · Rustem Islamov, Michael Crawshaw, Jeremy Cohen, Robert Gower · 2026-05-29 04:00

非欧几里得梯度下降在稳定边缘运行

arXiv:2603.05002v2 Announce Type: replace-cross Abstract: The Edge of Stability (EoS) is a phenomenon where the sharpness (largest eigenvalue) of the Hessian approaches and then hovers near the stability threshold $2/\eta$ during gradient descent (GD) with step size $\eta$. Despi…

arXiv stat.ML TIER_1 English(EN) · Sam Power · 2026-05-28 17:16

坐标上升变分推断的Wasserstein收缩

We study the contraction in Wasserstein distance of the coordinate ascent variational inference algorithm. This is shown to hold under a transport-information inequality at the fixed points and a functional smoothness condition. The results are general and sharp, allow for local …

arXiv stat.ML TIER_1 English(EN) · Jack Timmermans, Sergio A. Alvarez · 2026-05-28 04:00

最优岭回归正则化再探讨

arXiv:2605.28679v1 Announce Type: cross Abstract: We consider $L^2$-regularized linear (ridge) regression over a finite data sample $X$ with bounded covariance and linear prediction targets $y$ with additive isotropic noise of finite variance. We present an iterative procedure to…

arXiv stat.ML TIER_1 English(EN) · Yibo Jacky Zhang, Zeyu Tang, Sanmi Koyejo · 2026-05-28 04:00

反向传播是否最优？合成梯度何时能提高样本效率

arXiv:2605.27946v1 Announce Type: new Abstract: Backpropagation is the default learning rule for artificial neural networks and is often treated as the settled approach whenever differentiability is available. In this work, we revisit this convention through a theoretical lens of…

arXiv stat.ML TIER_1 English(EN) · Sergei Tikhonov, Arsen Vasilyan · 2026-05-28 04:00

高斯边际下半空间函数的可信学习

arXiv:2605.27594v1 Announce Type: cross Abstract: We study the problem of computationally efficient proper agnostic learning of multidimensional concept classes under the Gaussian distribution. In this setting, given i.i.d. labeled samples from an unknown distribution over $\math…

arXiv stat.ML TIER_1 English(EN) · Qin Lu, Konstantinos D. Polyzos, Bingcong Li, Georgios B. Giannakis · 2026-05-28 04:00

贝叶斯优化中的代理模型：超越单一高斯过程

arXiv:2205.14090v2 Announce Type: replace Abstract: Bayesian optimization (BO) has well-documented merits for optimizing black-box functions with an expensive evaluation cost. Such functions emerge in applications as diverse as hyperparameter tuning, drug discovery, and robotics.…

arXiv stat.ML TIER_1 English(EN) · Kam\'elia Daudel, Fran\c{c}ois Roueff · 2026-05-28 04:00

Importance Weighted Variational Inference 学习

arXiv:2410.12035v2 Announce Type: replace Abstract: Several variational bounds involving importance weighting ideas generalize the Evidence Lower BOund (ELBO) for marginal likelihood optimization, such as the Importance-weighted Auto-Encoder (IWAE), Variational R\'enyi (VR) and V…

arXiv stat.ML TIER_1 English(EN) · Tam Le (LPSM) · 2026-05-28 04:00

用于 Wasserstein 分布鲁棒优化的随机梯度法的非正则化极限

arXiv:2506.04948v2 Announce Type: replace-cross Abstract: Wasserstein distributionally robust optimization offers a framework for model fitting in machine learning under potential shifts in the data distribution. We study a regularized variant of this problem in which entropic sm…

arXiv stat.ML TIER_1 English(EN) · Stefano Bruno, Youngsik Hwang, Jaehyeon An, Sotirios Sabanis, Dong-Young Lim · 2026-05-28 04:00

考虑平坦度的随机梯度 Langevin 动力学

arXiv:2510.02174v3 Announce Type: replace-cross Abstract: Flatness of the loss landscape has been widely studied as an important perspective for understanding the behavior and generalization of deep learning algorithms. Motivated by this view, we propose Flatness-Aware Stochastic…

arXiv stat.ML TIER_1 English(EN) · Sergio A. Alvarez · 2026-05-27 16:12

最优岭回归正则化再探

We consider $L^2$-regularized linear (ridge) regression over a finite data sample $X$ with bounded covariance and linear prediction targets $y$ with additive isotropic noise of finite variance. We present an iterative procedure to compute the optimal regularization strength numer…

arXiv stat.ML TIER_1 English(EN) · Sanmi Koyejo · 2026-05-27 04:37

反向传播是否最优？合成梯度何时能提高样本效率

Backpropagation is the default learning rule for artificial neural networks and is often treated as the settled approach whenever differentiability is available. In this work, we revisit this convention through a theoretical lens of sample efficiency. We introduce a unified vecto…

arXiv stat.ML TIER_1 English(EN) · Zhaosong Lu, Xiangyuan Wang · 2026-05-27 04:00

一种用于约束非凸非凹极小极大优化的全序方法

arXiv:2510.01168v3 Announce Type: replace-cross Abstract: We study a class of constrained nonconvex-nonconcave minimax optimization problems in which the inner maximization involves potentially complex constraints. Under the assumption that the inner problem of a novel lifted min…

arXiv stat.ML TIER_1 English(EN) · Zusen Xu, Jia-Jie Zhu · 2026-05-27 04:00

基于梯度流采样器的分布鲁棒优化

arXiv:2510.25956v3 Announce Type: replace-cross Abstract: We propose a mathematically principled PDE gradient flow framework for distributionally robust optimization (DRO). Exploiting the recent advances in the intersection of Markov Chain Monte Carlo sampling and gradient flow t…

arXiv stat.ML TIER_1 English(EN) · Mikalai Korbit, Mario Zanon · 2026-05-27 04:00

增量高斯-牛顿下降法用于机器学习

arXiv:2408.05560v2 Announce Type: replace-cross Abstract: Stochastic gradient updates are widely used for their efficiency and scalability, but their effective step sizes can depend strongly on feature scaling and local model sensitivity. Gauss-Newton methods address such scale e…

arXiv stat.ML TIER_1 English(EN) · Arsen Vasilyan · 2026-05-26 19:07

高斯边际下半空间函数的正确不可知学习

We study the problem of computationally efficient proper agnostic learning of multidimensional concept classes under the Gaussian distribution. In this setting, given i.i.d. labeled samples from an unknown distribution over $\mathbb{R}^d \times \{\pm 1\}$ whose marginal on $\math…

arXiv stat.ML TIER_1 English(EN) · Navil Nandhan, Abbas Khademi, Antonio Silveti-Falls · 2026-05-26 04:00

受限非凸优化的增强随机Frank-Wolfe算法

arXiv:2605.25255v1 Announce Type: cross Abstract: The boosted Frank-Wolfe algorithm accelerates the classical Frank-Wolfe algorithm by better aligning the update direction with the negative gradient. Its analysis, however, has been limited to deterministic convex problems, with s…

arXiv stat.ML TIER_1 English(EN) · Wenhao Yang · 2026-05-25 16:18

随机梯度下降的有限方差之外的统计推断

Stochastic gradient descent (SGD) is a foundational algorithm for large-scale statistical learning and stochastic optimization. However, statistical inference based on SGD iterates remains challenging when stochastic gradients have infinite variance, as the relevant limiting dist…

arXiv stat.ML TIER_1 English(EN) · Antonio Silveti-Falls · 2026-05-24 21:04

受限非凸优化的增强随机Frank-Wolfe方法

The boosted Frank-Wolfe algorithm accelerates the classical Frank-Wolfe algorithm by better aligning the update direction with the negative gradient. Its analysis, however, has been limited to deterministic convex problems, with step sizes that require either line search or knowl…

arXiv stat.ML TIER_1 English(EN) · Krishnakumar Balasubramanian · 2026-05-22 04:00

有限粒子收敛率在保守和非保守漂移模型上的应用

arXiv:2605.22795v1 Announce Type: new Abstract: We propose and analyze a conservative drifting method for one-step generative modeling. The method replaces the original displacement-based drifting velocity by a kernel density estimator (KDE)-gradient velocity, namely the differen…

arXiv cs.CV TIER_1 English(EN) · Gang Dai, Yining Huang, Yiming Xia, Guohao Chen, Shuaicheng Niu · 2026-05-22 04:00

基于稀疏缩放的引导轨迹优化用于测试时扩散

arXiv:2605.21907v1 Announce Type: new Abstract: The efficient Test-Time Scaling (TTS) paradigm offers a promising perspective for enhancing the generation performance of diffusion models. However, current solutions are limited to a static, pre-defined noise pool and suffer from i…

arXiv stat.ML TIER_1 English(EN) · Krishnakumar Balasubramanian · 2026-05-21 17:49

有限粒子收敛率在保守和非保守漂移模型上的应用

We propose and analyze a conservative drifting method for one-step generative modeling. The method replaces the original displacement-based drifting velocity by a kernel density estimator (KDE)-gradient velocity, namely the difference of the kernel-smoothed data score and the ker…

arXiv stat.ML TIER_1 English(EN) · Tansheng Zhu, Hongyu Zhou, Ke Jin, Xusheng Xu, Qiufan Yuan, Lijie Ji · 2026-05-21 04:00

基于核回归和基于密度的探索的贝叶斯优化

arXiv:2502.06178v5 Announce Type: replace-cross Abstract: Bayesian optimization is highly effective for optimizing expensive-to-evaluate black-box functions, but it faces significant computational challenges due to the cubic per-iteration cost of Gaussian processes, which results…

arXiv stat.ML TIER_1 Italiano(IT) · Fares El Khoury, Houssam Zenati, Nathan Kallus, Michael Arbel, Aur\'elien Bibaut · 2026-05-21 04:00

半参数有效双层梯度估计

arXiv:2605.21341v1 Announce Type: new Abstract: Functional bilevel methods estimate a lower-level function and plug it into a hypergradient, but this plug-in gradient can retain first-order bias when the lower-level problem is learned nonparametrically. To remove this bias, we de…

arXiv stat.ML TIER_1 English(EN) · Shubhada Agrawal, Siva Theja Maguluri, Martin Zubeldia · 2026-05-21 04:00

重尾马尔可夫噪声下一般随机逼近的集中性

arXiv:2605.20999v1 Announce Type: cross Abstract: We establish maximal concentration bounds for the iterates generated by stochastic approximation algorithms with general step sizes, where the noise has a finite-state Markovian component plus a Martingale-difference component. Wh…

arXiv stat.ML TIER_1 Italiano(IT) · Aurélien Bibaut · 2026-05-20 16:07

半参数高效双层梯度估计

Functional bilevel methods estimate a lower-level function and plug it into a hypergradient, but this plug-in gradient can retain first-order bias when the lower-level problem is learned nonparametrically. To remove this bias, we develop a semiparametric debiasing theory for popu…

arXiv stat.ML TIER_1 English(EN) · Martin Zubeldia · 2026-05-20 10:38

重尾马尔可夫噪声下一般随机逼近的集中性

We establish maximal concentration bounds for the iterates generated by stochastic approximation algorithms with general step sizes, where the noise has a finite-state Markovian component plus a Martingale-difference component. When the Martingale-difference noise is bounded, we …

arXiv stat.ML TIER_1 English(EN) · Kyurae Kim, Qiang Fu, Yi-An Ma, Jacob R. Gardner, Trevor Campbell · 2026-05-20 04:00

从布雷斯-沃瑟斯坦到参数空间：使用Price梯度估计器的随机梯度变分推断

arXiv:2602.18718v2 Announce Type: replace Abstract: For approximating a target distribution given only its unnormalized log-density, stochastic gradient-based variational inference (VI) algorithms are a popular approach. For example, Wasserstein VI (WVI) and black-box VI (BBVI) p…

arXiv stat.ML TIER_1 English(EN) · Sharan Sahu, Cameron J. Hogan, Martin T. Wells · 2026-05-20 04:00

关于动量SGD在非平稳随机优化中可证明的次优性

arXiv:2601.12238v4 Announce Type: replace Abstract: In this paper, we provide a comprehensive theoretical analysis of Stochastic Gradient Descent (SGD) and its momentum variants (Polyak Heavy-Ball and Nesterov) for tracking time-varying optima under strong convexity and smoothnes…

arXiv stat.ML TIER_1 English(EN) · Yohann De Castro (ICJ, ECL, IUF, PSPM), S\'ebastien Gadat (TSE-R, IUF), Cl\'ement Marteau (ICJ, UCBL, PSPM) · 2026-05-20 04:00

Fast Spawn\&Prune (FS\&P)：通过生灭过程实现随机圆锥粒子梯度下降的全局收敛

arXiv:2605.19784v1 Announce Type: cross Abstract: We investigate the global optimization of the objective function arising in continuous sparse regression, specifically the Beurling LASSO (BLASSO), over the space of measures. While Conic Particle Gradient Descent (CPGD) methods a…

arXiv stat.ML TIER_1 English(EN) · Clément Marteau · 2026-05-19 12:50

快速生成与剪枝 (FS&P)：通过生灭过程实现随机圆锥粒子梯度下降的全局收敛

We investigate the global optimization of the objective function arising in continuous sparse regression, specifically the Beurling LASSO (BLASSO), over the space of measures. While Conic Particle Gradient Descent (CPGD) methods are computationally efficient, they may become trap…

arXiv stat.ML TIER_1 English(EN) · Wa\"iss Azizian, Franck Iutzeler, J\'er\^ome Malick, Panayotis Mertikopoulos · 2026-05-19 04:00

随机梯度下降的长期分布是什么？大偏差分析

arXiv:2406.09241v3 Announce Type: replace-cross Abstract: In this paper, we examine the long-run distribution of stochastic gradient descent (SGD) in general, non-convex problems. Specifically, we seek to understand which regions of the problem's state space are more likely to be…

arXiv stat.ML TIER_1 English(EN) · Zijian Liu · 2026-05-19 04:00

非光滑凸优化在重尾噪声下的梯度裁剪方法：精细化分析

arXiv:2512.23178v3 Announce Type: replace-cross Abstract: Optimization under heavy-tailed noise has become popular recently, since it better fits many modern machine learning tasks, as captured by empirical observations. Concretely, instead of a finite second moment on gradient n…

arXiv stat.ML TIER_1 English(EN) · Tobias Brock, Thomas Nagler · 2026-05-19 04:00

非平稳加权风险最小化的快速收敛率

arXiv:2602.05742v2 Announce Type: replace Abstract: Weighted empirical risk minimization is a common approach to prediction under distribution drift. This article studies its out-of-sample prediction error under nonstationarity. We provide a general decomposition of the excess ri…

arXiv stat.ML TIER_1 English(EN) · Ye He, Krishnakumar Balasubramanian, Sayan Banerjee, Promit Ghosal · 2026-05-19 04:00

正则化史坦变分梯度下降的有限粒子率

arXiv:2602.05172v2 Announce Type: replace Abstract: We derive finite-particle rates for the regularized Stein variational gradient descent (R-SVGD) algorithm introduced by He et al. (2024) that corrects the constant-order bias of the SVGD by applying a resolvent-type precondition…

arXiv stat.ML TIER_1 English(EN) · Zijian Liu · 2026-05-19 04:00

自适应梯度方法在重尾噪声下能否收敛？以AdaGrad为例的研究

arXiv:2605.18694v1 Announce Type: cross Abstract: Many tasks in modern machine learning are observed to involve heavy-tailed gradient noise during the optimization process. To manage this realistic and challenging setting, new mechanisms, such as gradient clipping and gradient no…

arXiv stat.ML TIER_1 English(EN) · Zijian Liu · 2026-05-18 17:30

自适应梯度方法在重尾噪声下能否收敛？以 AdaGrad 为例的研究

Many tasks in modern machine learning are observed to involve heavy-tailed gradient noise during the optimization process. To manage this realistic and challenging setting, new mechanisms, such as gradient clipping and gradient normalization, have been introduced to ensure the co…

r/MachineLearning TIER_1 English(EN) · /u/Otaku_7nfy · 2026-06-03 11:57

TorchDAE: 索引缩减和伴随敏感性下的隐式DAE求解器 [P]

<div class="md">Hello everyone, I've been working on a PyTorch library for solving Differential Algebraic Equations (DAEs) that supports vectorized execution and GPU acceleration. The library implements several algorithms that are not currently ava…

报道来源 [235]