PulseAugur
EN
LIVE 10:17:30

Thompson Sampling algorithms advance risk-averse and GP bandits

Two new research papers explore advancements in Thompson Sampling for bandit problems. The first paper introduces an algorithm for risk-averse bandits with sub-Gaussian rewards, achieving asymptotic optimality for various risk functionals. The second paper presents algorithms for joint prior selection and regret minimization in Gaussian Process bandits, demonstrating effectiveness through theoretical analysis and experiments. AI

IMPACT These papers advance theoretical understanding and algorithmic capabilities in bandit problems, potentially improving decision-making in areas like reinforcement learning and online optimization.

RANK_REASON Two academic papers published on arXiv detailing novel algorithms for bandit problems.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 4 sources. How we write summaries →

COVERAGE [4]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    Asymptotic Optimality of Thompson Sampling for Risk-Averse Bandits with Sub-Gaussian Rewards

    We prove that $ρ\text{-}\mathrm{NPTS}_{\mathrm{SG}}$, an anchor-free nonparametric Thompson Sampling algorithm for risk-averse bandits, achieves regret matching the instance-dependent lower bound to leading order in $\log n$, establishing it as asymptotically optimal for any cont…

  2. arXiv stat.ML TIER_1 English(EN) · Joel Q. L. Chang ·

    Asymptotic Optimality of Thompson Sampling for Risk-Averse Bandits with Sub-Gaussian Rewards

    arXiv:2606.09191v1 Announce Type: cross Abstract: We prove that $\rho\text{-}\mathrm{NPTS}_{\mathrm{SG}}$, an anchor-free nonparametric Thompson Sampling algorithm for risk-averse bandits, achieves regret matching the instance-dependent lower bound to leading order in $\log n$, e…

  3. arXiv stat.ML TIER_1 English(EN) · Jack Sandberg, Morteza Haghir Chehreghani ·

    Adaptive Prior Selection in Gaussian Process Bandits with Thompson Sampling

    arXiv:2502.01226v4 Announce Type: replace-cross Abstract: Gaussian process (GP) bandits provide a powerful framework for performing blackbox optimization of unknown functions. The characteristics of the unknown function depend heavily on the assumed GP prior. Most work in the lit…

  4. arXiv stat.ML TIER_1 English(EN) · Joel Q. L. Chang ·

    Asymptotic Optimality of Thompson Sampling for Risk-Averse Bandits with Sub-Gaussian Rewards

    We prove that $ρ\text{-}\mathrm{NPTS}_{\mathrm{SG}}$, an anchor-free nonparametric Thompson Sampling algorithm for risk-averse bandits, achieves regret matching the instance-dependent lower bound to leading order in $\log n$, establishing it as asymptotically optimal for any cont…