Thompson Sampling algorithms advance risk-averse and GP bandits

By PulseAugur Editorial · [5 sources] · 2026-06-08 08:26

Two new research papers explore advancements in Thompson Sampling for bandit problems. The first paper introduces an algorithm for risk-averse bandits with sub-Gaussian rewards, achieving asymptotic optimality for various risk functionals. The second paper presents algorithms for joint prior selection and regret minimization in Gaussian Process bandits, demonstrating effectiveness through theoretical analysis and experiments. AI

IMPACT These papers advance theoretical understanding and algorithmic capabilities in bandit problems, potentially improving decision-making in areas like reinforcement learning and online optimization.

RANK_REASON Two academic papers published on arXiv detailing novel algorithms for bandit problems.

Read on Hugging Face Daily Papers →

paper
other

AI-generated summary · Google Gemini · from 5 sources. How we write summaries →

Thompson Sampling algorithms advance risk-averse and GP bandits

COVERAGE [5]

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-08 08:26

Asymptotic Optimality of Thompson Sampling for Risk-Averse Bandits with Sub-Gaussian Rewards

We prove that $ρ\text{-}\mathrm{NPTS}_{\mathrm{SG}}$, an anchor-free nonparametric Thompson Sampling algorithm for risk-averse bandits, achieves regret matching the instance-dependent lower bound to leading order in $\log n$, establishing it as asymptotically optimal for any cont…
arXiv stat.ML TIER_1 English(EN) · Shion Takeno, Shogo Iwazaki · 2026-06-11 04:00

On Regret Bounds of Thompson Sampling for Bayesian Optimization

arXiv:2603.09276v2 Announce Type: replace Abstract: We study a widely used Bayesian optimization method, Gaussian process Thompson sampling (GP-TS), under the assumption that the objective function is a sample path from a GP. Compared with the GP upper confidence bound (GP-UCB) w…
arXiv stat.ML TIER_1 English(EN) · Joel Q. L. Chang · 2026-06-09 04:00

Asymptotic Optimality of Thompson Sampling for Risk-Averse Bandits with Sub-Gaussian Rewards

arXiv:2606.09191v1 Announce Type: cross Abstract: We prove that $\rho\text{-}\mathrm{NPTS}_{\mathrm{SG}}$, an anchor-free nonparametric Thompson Sampling algorithm for risk-averse bandits, achieves regret matching the instance-dependent lower bound to leading order in $\log n$, e…
arXiv stat.ML TIER_1 English(EN) · Jack Sandberg, Morteza Haghir Chehreghani · 2026-06-09 04:00

Adaptive Prior Selection in Gaussian Process Bandits with Thompson Sampling

arXiv:2502.01226v4 Announce Type: replace-cross Abstract: Gaussian process (GP) bandits provide a powerful framework for performing blackbox optimization of unknown functions. The characteristics of the unknown function depend heavily on the assumed GP prior. Most work in the lit…
arXiv stat.ML TIER_1 English(EN) · Joel Q. L. Chang · 2026-06-08 08:26

Asymptotic Optimality of Thompson Sampling for Risk-Averse Bandits with Sub-Gaussian Rewards

We prove that $ρ\text{-}\mathrm{NPTS}_{\mathrm{SG}}$, an anchor-free nonparametric Thompson Sampling algorithm for risk-averse bandits, achieves regret matching the instance-dependent lower bound to leading order in $\log n$, establishing it as asymptotically optimal for any cont…

COVERAGE [5]

Asymptotic Optimality of Thompson Sampling for Risk-Averse Bandits with Sub-Gaussian Rewards

On Regret Bounds of Thompson Sampling for Bayesian Optimization

Asymptotic Optimality of Thompson Sampling for Risk-Averse Bandits with Sub-Gaussian Rewards

Adaptive Prior Selection in Gaussian Process Bandits with Thompson Sampling

Asymptotic Optimality of Thompson Sampling for Risk-Averse Bandits with Sub-Gaussian Rewards

RELATED ENTITIES

RELATED TOPICS