新研究推动上下文老虎机算法在动态复杂环境中的应用

作者 PulseAugur 编辑部 · [9 个来源] · 2026-05-18 15:01

研究人员正在探索上下文老虎机问题的先进技术，重点是改进遗憾界限和处理动态环境。一篇论文介绍了一种可重试的老虎机算法，旨在优化多次尝试中的最佳结果，并证明了该目标的首个次线性遗憾界限。另一项研究提出了主动上下文选择以提高上下文老虎机中的简单遗憾，显示出比被动采样显著的改进。此外，一种名为 PONA 的新方法被提出用于离线上下文老虎机，该方法通过利用动作特征可以有效地学习和选择新动作，其性能优于受限于预定义动作集的现有方法。最后，一种名为 RIE-Greedy 的新颖方法使用上下文老虎机中的正则化诱导探索，证明了与 Thompson Sampling 的理论等价性以及实际有效性。 AI

影响这些论文为上下文老虎机问题引入了新颖的算法和理论分析，有可能改进推荐系统和其他应用中的决策制定。

排序理由该集群包含多篇关于老虎机算法理论进展的学术论文。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 9 个来源。我们如何撰写摘要 →

报道来源 [9]

arXiv cs.LG TIER_1 English(EN) · Shuche Wang, Adarsh Barik, Vincent Y. F. Tan · 2026-05-22 04:00

具有梯度预测适应性的土匪凸优化

arXiv:2605.22191v1 Announce Type: new Abstract: Bandit convex optimization (BCO) is a fundamental online learning framework with partial feedback, where the learner observes only the loss incurred at the chosen decision point in each round. In this work, we investigate whether op…
arXiv cs.LG TIER_1 English(EN) · Paavo Parmas · 2026-05-20 07:44

有限时间后悔分析：重试感知老虎机

We study a stochastic bandit algorithm motivated by retry-aware objectives that value the best outcome among multiple attempts, such as pass@$k$ and max@$k$. Given a posterior over arm values, ReMax chooses a sampling distribution that maximizes the posterior expected maximum rew…
arXiv cs.LG TIER_1 English(EN) · Negar Kiyavash · 2026-05-19 16:01

主动上下文选择改进上下文老虎机中的简单遗憾

We study the contextual multi-armed bandit problem with a finite context space (a.k.a. subpopulations), where the learner recommends a best action for each context and is evaluated by context-weighted simple regret. Our guarantees are worst-case over the reward distributions, whi…
arXiv cs.LG TIER_1 English(EN) · Yuta Saito · 2026-05-18 15:01

Offline Contextual Bandits in the Presence of New Actions

Automated decision-making algorithms drive applications such as recommendation systems and search engines. These algorithms often rely on off-policy contextual bandits or off-policy learning (OPL). Conventionally, OPL selects actions that maximize the expected reward from an exis…
arXiv stat.ML TIER_1 English(EN) · Avrim Blum, Marten Garicano, Kavya Ravichandran, Dravyansh Sharma · 2026-05-22 04:00

用于改进多臂老虎机问题的算法设计与更强的保证

arXiv:2511.10619v2 Announce Type: replace-cross Abstract: The improving multi-armed bandits problem is a formal model for allocating effort under uncertainty, motivated by scenarios such as investing research effort into new technologies, performing clinical trials, and hyperpara…
arXiv stat.ML TIER_1 English(EN) · Hamed Khosravi, Xiaoming Huo · 2026-05-21 04:00

捕捉移动子空间：非平稳情况下的低秩老虎机

arXiv:2605.20269v1 Announce Type: cross Abstract: Many bandit deployments (recommendation, clinical dosing, ad targeting) share two facts prior work handles only in isolation: rewards live on a low-dimensional latent subspace, and that subspace drifts. Stationary low-rank bandits…
arXiv stat.ML TIER_1 English(EN) · Sakshi Arya, Hyebin Song · 2026-05-21 04:00

带协变量的批量单索引全局多臂老虎机

arXiv:2503.00565v3 Announce Type: replace Abstract: The multi-armed bandits (MAB) framework is a widely used approach for sequential decision-making, where a decision-maker selects an arm in each round with the goal of maximizing long-term rewards. In many practical applications,…
arXiv stat.ML TIER_1 English(EN) · Tong Li, Thiago de Queiroz Casanova, Eric M. Schwartz, Victor Kostyuk, Dehan Kong, Joseph J. Williams · 2026-05-19 04:00

RIE-Greedy: 正则化诱导探索用于上下文老虎机

arXiv:2603.11276v2 Announce Type: replace Abstract: Real-world contextual bandit problems with complex reward models are often tackled with iteratively trained models, such as boosting trees. However, it is difficult to directly apply simple and effective exploration strategies--…
arXiv stat.ML TIER_1 English(EN) · Xiaoming Huo · 2026-05-18 22:01

捕捉移动子空间：超越平稳性的低秩老虎机

Many bandit deployments (recommendation, clinical dosing, ad targeting) share two facts prior work handles only in isolation: rewards live on a low-dimensional latent subspace, and that subspace drifts. Stationary low-rank bandits exploit rank but break under subspace change; non…

报道来源 [9]

相关实体

相关话题