English(EN) Asymptotic Optimality of Thompson Sampling for Risk-Averse Bandits with Sub-Gaussian Rewards

汤普森采样算法在风险规避型和高斯过程老虎机问题上取得进展

作者 PulseAugur 编辑部 · [5 个来源] · 2026-06-08 08:26

两篇新研究论文探讨了汤普森采样在老虎机问题上的进展。第一篇论文介绍了一种用于具有次高斯奖励的风险规避型老虎机问题的算法，该算法对各种风险函数实现了渐近最优性。第二篇论文提出了联合先验选择和高斯过程老虎机问题遗憾最小化的算法，并通过理论分析和实验证明了其有效性。 AI

影响这些论文在老虎机问题的理论理解和算法能力方面取得了进展，有可能改进强化学习和在线优化等领域的决策。

排序理由两篇在arXiv上发表的学术论文，详细介绍了老虎机问题的新算法。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。我们如何撰写摘要 →

报道来源 [5]

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-08 08:26

亚渐近最优性：Thompson Sampling 在具有亚高斯奖励的风险规避老虎机问题中的应用

We prove that $ρ\text{-}\mathrm{NPTS}_{\mathrm{SG}}$, an anchor-free nonparametric Thompson Sampling algorithm for risk-averse bandits, achieves regret matching the instance-dependent lower bound to leading order in $\log n$, establishing it as asymptotically optimal for any cont…
arXiv stat.ML TIER_1 English(EN) · Shion Takeno, Shogo Iwazaki · 2026-06-11 04:00

关于贝叶斯优化的 Thompson 采样遗憾界限

arXiv:2603.09276v2 Announce Type: replace Abstract: We study a widely used Bayesian optimization method, Gaussian process Thompson sampling (GP-TS), under the assumption that the objective function is a sample path from a GP. Compared with the GP upper confidence bound (GP-UCB) w…
arXiv stat.ML TIER_1 English(EN) · Joel Q. L. Chang · 2026-06-09 04:00

具有亚高斯奖励的风险规避老虎机问题的汤普森采样渐近最优性

arXiv:2606.09191v1 Announce Type: cross Abstract: We prove that $\rho\text{-}\mathrm{NPTS}_{\mathrm{SG}}$, an anchor-free nonparametric Thompson Sampling algorithm for risk-averse bandits, achieves regret matching the instance-dependent lower bound to leading order in $\log n$, e…
arXiv stat.ML TIER_1 English(EN) · Jack Sandberg, Morteza Haghir Chehreghani · 2026-06-09 04:00

高斯过程老虎机中的自适应先验选择与汤普森采样

arXiv:2502.01226v4 Announce Type: replace-cross Abstract: Gaussian process (GP) bandits provide a powerful framework for performing blackbox optimization of unknown functions. The characteristics of the unknown function depend heavily on the assumed GP prior. Most work in the lit…
arXiv stat.ML TIER_1 English(EN) · Joel Q. L. Chang · 2026-06-08 08:26

具有亚高斯奖励的风险规避老虎机问题的汤普森采样渐近最优性

We prove that $ρ\text{-}\mathrm{NPTS}_{\mathrm{SG}}$, an anchor-free nonparametric Thompson Sampling algorithm for risk-averse bandits, achieves regret matching the instance-dependent lower bound to leading order in $\log n$, establishing it as asymptotically optimal for any cont…

报道来源 [5]

亚渐近最优性：Thompson Sampling 在具有亚高斯奖励的风险规避老虎机问题中的应用

关于贝叶斯优化的 Thompson 采样遗憾界限

具有亚高斯奖励的风险规避老虎机问题的汤普森采样渐近最优性

高斯过程老虎机中的自适应先验选择与汤普森采样

具有亚高斯奖励的风险规避老虎机问题的汤普森采样渐近最优性

相关实体

相关话题