English(EN) Algorithm for Contextual Queueing Bandits with Rate-Optimal Queue Length Regret

新的老虎机算法解决了可复现性、核复杂度与流式数据问题

作者 PulseAugur 编辑部 · [11 个来源] · 2026-06-08 03:58

研究人员发表了多篇论文，详细介绍了多臂老虎机算法的进展。一项研究介绍了用于随机和线性老虎机的可复现UCB探索方法，改进了遗憾保证。另一篇论文统一了高斯过程UCB和决策估计系数方法用于核老虎机，强调了算法信息与minimax复杂度之间的区别。此外，新算法解决了具有有限内存的滑动窗口流式老虎机和上下文排队老虎机问题，实现了改进的遗憾率并表征了minimax依赖性。 AI

影响多臂老虎机算法的进步可以为推荐引擎、资源分配和实验平台带来更高效的在线学习系统。

排序理由在arXiv上发表了多篇学术论文，详细介绍了多臂老虎机领域的算法和理论分析。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 11 个来源。我们如何撰写摘要 →

报道来源 [11]

arXiv cs.LG TIER_1 English(EN) · Rohan Deb, Udaya Ghai, Karan Singh, Arindam Banerjee · 2026-06-10 04:00

Replicable Bandits with UCB based Exploration

arXiv:2604.20024v2 Announce Type: replace Abstract: We study replicable algorithms for stochastic multi-armed bandits (MAB) and linear bandits with UCB (Upper Confidence Bound) based exploration. A bandit algorithm is $\rho$-replicable if two executions using shared internal rand…
arXiv cs.LG TIER_1 English(EN) · Yunbei Xu · 2026-06-10 04:00

Algorithmic and Minimax Complexities in Kernel Bandits

arXiv:2606.11171v1 Announce Type: new Abstract: Gaussian-process upper confidence bound (GP-UCB) and decision-estimation-coefficient (DEC) methods may appear, at first sight, to belong to different theories. This paper places the two viewpoints in a common algorithmic-information…
arXiv cs.LG TIER_1 English(EN) · Yunbei Xu · 2026-06-09 17:49

Algorithmic and Minimax Complexities in Kernel Bandits

Gaussian-process upper confidence bound (GP-UCB) and decision-estimation-coefficient (DEC) methods may appear, at first sight, to belong to different theories. This paper places the two viewpoints in a common algorithmic-information language for frequentist RKHS bandits. GP-UCB f…
arXiv cs.LG TIER_1 English(EN) · Vladimir Braverman, Chen Wang, Liudeng Wang, Samson Zhou · 2026-06-09 04:00

带时新的在线学习：滑动窗口流式多臂老虎机算法

arXiv:2606.08977v1 Announce Type: new Abstract: Motivated by the recency effect in online learning, we study algorithms for single-pass *sliding-window streaming multi-armed bandits (MABs)* in this paper. In this setting, we are given $n$ arms with unknown sub-Gaussian reward dis…
arXiv cs.LG TIER_1 English(EN) · Seoungbin Bae, Dabeen Lee · 2026-06-09 04:00

具有速率最优队列长度遗憾的上下文排队多臂老虎机算法

arXiv:2606.09668v1 Announce Type: new Abstract: Contextual queueing bandits provide a framework for learning to schedule heterogeneous jobs under unknown context-dependent service rates. Under stochastic contexts, existing algorithms achieve $\widetilde{\mathcal{O}}(T^{-1/4})$ qu…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-08 17:53

Bandits for Efficient Experimentation: Adapting to Control Group, Preferences, and Context Drifts

We consider a variant of the linear contextual stochastic multi-armed bandits, where the learner must provide recommendations to a group of users, each having its personalized preference vector, and in the presence of context distributions that are drifting over time. Under pract…
arXiv cs.LG TIER_1 English(EN) · Dabeen Lee · 2026-06-08 15:51

具有速率最优队列长度遗憾的上下文排队老虎机算法

Contextual queueing bandits provide a framework for learning to schedule heterogeneous jobs under unknown context-dependent service rates. Under stochastic contexts, existing algorithms achieve $\widetilde{\mathcal{O}}(T^{-1/4})$ queue length regret, defined as the expected diffe…
arXiv stat.ML TIER_1 English(EN) · Deqi Zheng, Xiaoyang Xu, Yuhong Yang · 2026-06-09 04:00

带臂到达的多臂老虎机：序贯筛选、动态遗憾和亚线性保证

arXiv:2606.09002v1 Announce Type: new Abstract: We study a stochastic multi-armed bandit problem in which the set of available arms expands over time. This setting arises in sequential experimentation when new actions or treatments become available during an ongoing study, making…
arXiv stat.ML TIER_1 English(EN) · Udvas Das, Waris Radji, Debabrota Basu, Odalric-Ambrym Maillard · 2026-06-09 04:00

用于高效实验的 Bandit 算法：适应对照组、偏好和上下文漂移

arXiv:2606.09802v1 Announce Type: cross Abstract: We consider a variant of the linear contextual stochastic multi-armed bandits, where the learner must provide recommendations to a group of users, each having its personalized preference vector, and in the presence of context dist…
arXiv stat.ML TIER_1 English(EN) · Odalric-Ambrym Maillard · 2026-06-08 17:53

用于高效实验的 Bandit 算法：适应对照组、偏好和上下文漂移

We consider a variant of the linear contextual stochastic multi-armed bandits, where the learner must provide recommendations to a group of users, each having its personalized preference vector, and in the presence of context distributions that are drifting over time. Under pract…
arXiv stat.ML TIER_1 English(EN) · Yuhong Yang · 2026-06-08 03:58

具有到达臂的多臂老虎机：序贯筛选、动态遗憾和次线性保证

We study a stochastic multi-armed bandit problem in which the set of available arms expands over time. This setting arises in sequential experimentation when new actions or treatments become available during an ongoing study, making regret against a single best arm in hindsight i…

报道来源 [11]

相关实体

相关话题