PulseAugur
实时 11:39:36
English(EN) Asymptotic Optimality of Thompson Sampling for Risk-Averse Bandits with Sub-Gaussian Rewards

汤普森采样算法在风险规避型和高斯过程老虎机问题上取得进展

两篇新研究论文探讨了汤普森采样在老虎机问题上的进展。第一篇论文介绍了一种用于具有次高斯奖励的风险规避型老虎机问题的算法,该算法对各种风险函数实现了渐近最优性。第二篇论文提出了联合先验选择和高斯过程老虎机问题遗憾最小化的算法,并通过理论分析和实验证明了其有效性。 AI

影响 这些论文在老虎机问题的理论理解和算法能力方面取得了进展,有可能改进强化学习和在线优化等领域的决策。

排序理由 两篇在arXiv上发表的学术论文,详细介绍了老虎机问题的新算法。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。 我们如何撰写摘要 →

报道来源 [4]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    Asymptotic Optimality of Thompson Sampling for Risk-Averse Bandits with Sub-Gaussian Rewards

    We prove that $ρ\text{-}\mathrm{NPTS}_{\mathrm{SG}}$, an anchor-free nonparametric Thompson Sampling algorithm for risk-averse bandits, achieves regret matching the instance-dependent lower bound to leading order in $\log n$, establishing it as asymptotically optimal for any cont…

  2. arXiv stat.ML TIER_1 English(EN) · Joel Q. L. Chang ·

    具有亚高斯奖励的风险规避老虎机问题的汤普森采样渐近最优性

    arXiv:2606.09191v1 Announce Type: cross Abstract: We prove that $\rho\text{-}\mathrm{NPTS}_{\mathrm{SG}}$, an anchor-free nonparametric Thompson Sampling algorithm for risk-averse bandits, achieves regret matching the instance-dependent lower bound to leading order in $\log n$, e…

  3. arXiv stat.ML TIER_1 English(EN) · Jack Sandberg, Morteza Haghir Chehreghani ·

    高斯过程老虎机中的自适应先验选择与汤普森采样

    arXiv:2502.01226v4 Announce Type: replace-cross Abstract: Gaussian process (GP) bandits provide a powerful framework for performing blackbox optimization of unknown functions. The characteristics of the unknown function depend heavily on the assumed GP prior. Most work in the lit…

  4. arXiv stat.ML TIER_1 English(EN) · Joel Q. L. Chang ·

    具有亚高斯奖励的风险规避老虎机问题的汤普森采样渐近最优性

    We prove that $ρ\text{-}\mathrm{NPTS}_{\mathrm{SG}}$, an anchor-free nonparametric Thompson Sampling algorithm for risk-averse bandits, achieves regret matching the instance-dependent lower bound to leading order in $\log n$, establishing it as asymptotically optimal for any cont…