Researchers advance Bayesian Optimization for efficient decision-making and hyperparameter tuning

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 33 sources

Several recent arXiv papers explore advancements in multi-armed bandit problems, a framework for sequential decision-making under uncertainty. Research includes handling changing action availability with "Flickering Multi-Armed Bandits" and improving regret bounds in logistic bandits without strict context diversity assumptions. Other work focuses on geometry-aware offline-to-online learning, spectral bandits for smooth functions on graphs, and privacy-preserving algorithms for generalized linear contextual bandits. AI

Summary written by gemini-2.5-flash-lite from 33 sources. How we write summaries →

IMPACT Advances in bandit algorithms could lead to more efficient online learning systems and improved decision-making in recommendation, advertising, and resource allocation.

RANK_REASON Multiple arXiv papers published on various theoretical advancements in multi-armed bandit algorithms.

Read on Lil'Log (Lilian Weng) →

paper
other

Researchers advance Bayesian Optimization for efficient decision-making and hyperparameter tuning

COVERAGE [33]

Lil'Log (Lilian Weng) TIER_1 · 2018-01-23 00:00

The Multi-Armed Bandit Problem and Its Solutions

 <p>The algorithms are implemented for Bernoulli bandit in <a …
arXiv cs.LG TIER_1 · Maresa Schr\"oder, Pascal Janetzky, Michael Klar, Stefan Feuerriegel · 2026-05-08 04:00

ORTHOBO: Orthogonal Bayesian Hyperparameter Optimization

arXiv:2605.06454v1 Announce Type: new Abstract: Bayesian optimization is widely used for hyperparameter optimization when model evaluations are expensive; however, noisy acquisition estimates can lead to unstable decisions. We identify acquisition estimation noise as a failure mo…
arXiv cs.LG TIER_1 · Becky Langdon, Gabriel D. Patr\'on, Chrysoula D. Kappatou, Robert M. Lee, Behrang Shafei, Jixiang Qing, Ruth Misener, Mark van der Wilk, Calvin Tsay · 2026-05-08 04:00

Meta-learning for sample-efficient Bayesian optimisation of fed-batch processes

arXiv:2605.05382v1 Announce Type: cross Abstract: The optimisation of fed-batch (bio)chemical process recipes is subject to inherent, underlying, and unmeasurable fluctuations across batches, whose trajectories are difficult to model and costly to measure. Bayesian Optimisation (…
arXiv cs.AI TIER_1 · Stefan Feuerriegel · 2026-05-07 15:49

ORTHOBO: Orthogonal Bayesian Hyperparameter Optimization

Bayesian optimization is widely used for hyperparameter optimization when model evaluations are expensive; however, noisy acquisition estimates can lead to unstable decisions. We identify acquisition estimation noise as a failure mode that was previously overlooked: even when the…
arXiv cs.LG TIER_1 · Noel Thomas · 2026-05-07 04:00

Regime-Conditioned Evaluation in Multi-Context Bayesian Optimization

arXiv:2605.04895v1 Announce Type: new Abstract: Published transfer-BO comparisons often estimate an average treatment effect of acquisition choice over hidden regime variables, while practitioners need the conditional effect for their specific prior quality, budget ratio, and met…
arXiv cs.LG TIER_1 · Sofianos Panagiotis Fotias, Vassilis Gaganis · 2026-05-05 04:00

Inducing Permutation Invariant Priors in Bayesian Optimization for Carbon Capture and Storage Applications

arXiv:2605.02409v1 Announce Type: new Abstract: Bayesian Optimization is an iterative method, tailored to optimizing expensive black box objective functions. Surrogate models like Gaussian Processes, which are the gold standard in Bayesian Optimization, can be inefficient for inp…
arXiv cs.LG TIER_1 · Sofianos Panagiotis Fotias, Vassilis Gaganis · 2026-05-05 04:00

Bayesian Neural Network Surrogates for Bayesian Optimization of Carbon Capture and Storage Operations

arXiv:2507.21803v2 Announce Type: replace Abstract: Carbon Capture and Storage (CCS) stands as a pivotal technology for fostering a sustainable future. The process, which involves injecting supercritical CO$_2$ into underground formations, a method already widely used for Enhance…
arXiv cs.LG TIER_1 · Vassilis Gaganis · 2026-05-04 09:55

Inducing Permutation Invariant Priors in Bayesian Optimization for Carbon Capture and Storage Applications

Bayesian Optimization is an iterative method, tailored to optimizing expensive black box objective functions. Surrogate models like Gaussian Processes, which are the gold standard in Bayesian Optimization, can be inefficient for inputs with permutation symmetries, as the most com…
arXiv cs.LG TIER_1 · Jesse Schneider, William J. Welch · 2026-05-04 04:00

Bayesian Optimization in Linear Time

arXiv:2605.00237v1 Announce Type: new Abstract: Bayesian optimization is a sequential method for minimizing objective functions that are expensive to evaluate and about which few assumptions can be made. By using all gathered data to train a Gaussian process model for the functio…
arXiv cs.AI TIER_1 Deutsch(DE) · Ziwei Zhang, Jonathan Yu-Meng Li · 2026-05-01 04:00

Sampler-Robust Optimization under Generative Models

arXiv:2604.27447v1 Announce Type: cross Abstract: Modern stochastic optimization pipelines increasingly rely on learned generative models to represent uncertainty, while downstream decisions are evaluated almost entirely through Monte Carlo scenarios. This shifts the operational …
Hugging Face Daily Papers TIER_1 Deutsch(DE) · 2026-04-30 05:33

Sampler-Robust Optimization under Generative Models

Modern stochastic optimization pipelines increasingly rely on learned generative models to represent uncertainty, while downstream decisions are evaluated almost entirely through Monte Carlo scenarios. This shifts the operational object of uncertainty from an explicit probability…
arXiv cs.LG TIER_1 · Zean Han, Ruihan Lin, Zezhen Ding, Jiheng Zhang · 2026-04-28 04:00

Geometry-Aware Offline-to-Online Learning in Linear Contextual Bandits

arXiv:2604.24016v1 Announce Type: new Abstract: We study offline-to-online learning in linear contextual bandits with biased offline regression data: the offline parameter need not match the online one, so history should not be treated as a single warm start. We model directional…
arXiv cs.LG TIER_1 · Sourav Chakraborty, Amit Kiran Rege, Claire Monteleoni, Lijun Chen · 2026-04-28 04:00

Flickering Multi-Armed Bandits

arXiv:2602.17315v2 Announce Type: replace Abstract: We introduce Flickering Multi-Armed Bandits (FMAB) to model sequential decision-making in environments with changing action availability, where accessibility of the next action is restricted to a subset dependent on the agent's …
Hugging Face Daily Papers TIER_1 · 2026-04-27 22:05

A Finite Time Analysis of Thompson Sampling for Bayesian Optimization with Preferential Feedback

Preference feedback, in the form of pairwise comparisons rather than scalar scores, has seen increasing use in applications such as human-, laboratory-, and expert-in-the-loop design, as well as scientific discovery. We propose a Thompson Sampling (TS) approach to Bayesian optimi…
arXiv cs.LG TIER_1 · Seoungbin Bae, Dabeen Lee · 2026-04-27 04:00

Logistic Bandits with $\tilde{O}(\sqrt{dT})$ Regret without Context Diversity Assumptions

arXiv:2604.22161v1 Announce Type: new Abstract: We study the $K$-armed logistic bandit problem, where at each round, the agent observes $K$ feature vectors associated with $K$ actions. Existing approaches that achieve a rate-optimal $\tilde{\mathcal{O}}(\sqrt{dT})$ regret bound r…
Hugging Face Daily Papers TIER_1 · 2026-04-27 03:53

Geometry-Aware Offline-to-Online Learning in Linear Contextual Bandits

We study offline-to-online learning in linear contextual bandits with biased offline regression data: the offline parameter need not match the online one, so history should not be treated as a single warm start. We model directional transfer with a shift certificate $(M_{\mathrm{…
arXiv cs.LG TIER_1 · Dabeen Lee · 2026-04-24 02:21

Logistic Bandits with $\tilde{O}(\sqrt{dT})$ Regret without Context Diversity Assumptions

We study the $K$-armed logistic bandit problem, where at each round, the agent observes $K$ feature vectors associated with $K$ actions. Existing approaches that achieve a rate-optimal $\tilde{\mathcal{O}}(\sqrt{dT})$ regret bound rely heavily on context diversity assumptions, su…
arXiv stat.ML TIER_1 · Noel Thomas · 2026-05-06 13:27

Regime-Conditioned Evaluation in Multi-Context Bayesian Optimization

Published transfer-BO comparisons often estimate an average treatment effect of acquisition choice over hidden regime variables, while practitioners need the conditional effect for their specific prior quality, budget ratio, and metric. An audit of 40 transfer-BO papers from Neur…
arXiv stat.ML TIER_1 · Abhirami Pillai · 2026-04-30 04:00

Budget-Constrained Causal Bandits: Bridging Uplift Modeling and Sequential Decision-Making

arXiv:2604.26169v1 Announce Type: cross Abstract: Treatment allocation under budget constraints is a central challenge in digital advertising: advertisers must decide which users to show ads to while spending a limited budget wisely. The standard approach follows a two-stage offl…
arXiv stat.ML TIER_1 · Gergely Neu, Michal Valko · 2026-04-29 04:00

Online combinatorial optimization with stochastic decision sets and adversarial losses

arXiv:2604.25269v1 Announce Type: cross Abstract: Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments…
arXiv stat.ML TIER_1 · Tom\'a\v{s} Koc\'ak, R\'emi Munos, Branislav Kveton, Shipra Agrawal, Michal Valko · 2026-04-29 04:00

Spectral bandits

arXiv:2604.25272v1 Announce Type: new Abstract: Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this work, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learn…
arXiv stat.ML TIER_1 · Abhirami Pillai · 2026-04-28 23:24

Budget-Constrained Causal Bandits: Bridging Uplift Modeling and Sequential Decision-Making

Treatment allocation under budget constraints is a central challenge in digital advertising: advertisers must decide which users to show ads to while spending a limited budget wisely. The standard approach follows a two-stage offline pipeline - first collect historical data to es…
arXiv stat.ML TIER_1 · Michal Valko · 2026-04-28 06:29

Spectral bandits

Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this work, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as conten…
arXiv stat.ML TIER_1 · Michal Valko · 2026-04-28 06:28

Online combinatorial optimization with stochastic decision sets and adversarial losses

Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can be blocked or goods that are out of stoc…
arXiv stat.ML TIER_1 · Tomas Kocak, Gergely Neu, Michal Valko, Remi Munos · 2026-04-28 04:00

Efficient learning by implicit exploration in bandit problems with side observations

arXiv:2604.24555v1 Announce Type: cross Abstract: We consider online learning problems under a partial observability model capturing situations where the information conveyed to the learner is between full information and bandit feedback. In the simplest variant, we assume that i…
arXiv stat.ML TIER_1 · Sahasrajit Sarmasarkar · 2026-04-28 04:00

Shuffle and Joint Differential Privacy for Generalized Linear Contextual Bandits

arXiv:2602.00417v2 Announce Type: replace Abstract: We present the first algorithms for generalized linear contextual bandits under shuffle differential privacy and joint differential privacy. While prior work on private contextual bandits has been restricted to linear reward mod…
arXiv stat.ML TIER_1 · Remi Munos · 2026-04-27 14:45

Efficient learning by implicit exploration in bandit problems with side observations

We consider online learning problems under a partial observability model capturing situations where the information conveyed to the learner is between full information and bandit feedback. In the simplest variant, we assume that in addition to its own loss, the learner also gets …
arXiv stat.ML TIER_1 · Jierui Zuo, Hanzhang Qin · 2026-04-27 04:00

On Pareto Optimality for Parametric Choice Bandits

arXiv:2501.19277v4 Announce Type: replace Abstract: We study online assortment optimization under stochastic choice when a decision maker simultaneously values cumulative revenue performance and the quality of post-hoc inference on revenue contrasts. We analyze a forced-explorati…
arXiv stat.ML TIER_1 · Mat\'ias Carrasco, Alejandro Cholaquidis · 2026-04-27 04:00

Concave Statistical Utility Maximization Bandits via Influence-Function Gradients

arXiv:2604.22140v1 Announce Type: new Abstract: We study stochastic multi-armed bandits in which the objective is a statistical functional of the long-run reward distribution, rather than expected reward alone. Under mild continuity assumptions, we show that the infinite-horizon …
arXiv stat.ML TIER_1 · Alejandro Cholaquidis · 2026-04-24 01:13

Concave Statistical Utility Maximization Bandits via Influence-Function Gradients

We study stochastic multi-armed bandits in which the objective is a statistical functional of the long-run reward distribution, rather than expected reward alone. Under mild continuity assumptions, we show that the infinite-horizon problem reduces to optimizing over stationary mi…
arXiv stat.ML TIER_1 · Alejandro Cholaquidis · 2026-04-24 01:13

Concave Statistical Utility Maximization Bandits via Influence-Function Gradients

We study stochastic multi-armed bandits in which the objective is a statistical functional of the long-run reward distribution, rather than expected reward alone. Under mild continuity assumptions, we show that the infinite-horizon problem reduces to optimizing over stationary mi…
arXiv stat.ML TIER_1 · Michal Valko · 2026-04-23 08:48

A single algorithm for both restless and rested rotting bandits

In many application domains (e.g., recommender systems, intelligent tutoring systems), the rewards associated to the actions tend to decrease over time. This decay is either caused by the actions executed in the past (e.g., a user may get bored when songs of the same genre are re…
Eugene Yan TIER_1 · 2022-05-08 00:00

Bandits for Recommender Systems

Industry examples, exploration strategies, warm-starting, off-policy evaluation, and more.

COVERAGE [33]

RELATED ENTITIES

RELATED TOPICS