Researchers advance Bayesian Optimization for efficient decision-making and hyperparameter tuning

By PulseAugur Editorial · [33 sources] · 2018-01-23 00:00

Several recent arXiv papers explore advancements in multi-armed bandit problems, a framework for sequential decision-making under uncertainty. Research includes handling changing action availability with "Flickering Multi-Armed Bandits" and improving regret bounds in logistic bandits without strict context diversity assumptions. Other work focuses on geometry-aware offline-to-online learning, spectral bandits for smooth functions on graphs, and privacy-preserving algorithms for generalized linear contextual bandits. AI

IMPACT Advances in bandit algorithms could lead to more efficient online learning systems and improved decision-making in recommendation, advertising, and resource allocation.

RANK_REASON Multiple arXiv papers published on various theoretical advancements in multi-armed bandit algorithms.

Read on Lil'Log (Lilian Weng) →

paper
other

AI-generated summary · Google Gemini · from 33 sources. How we write summaries →

Researchers advance Bayesian Optimization for efficient decision-making and hyperparameter tuning

COVERAGE [33]

Lil'Log (Lilian Weng) TIER_1 English(EN) · 2018-01-23 00:00

The Multi-Armed Bandit Problem and Its Solutions

 <p>The algorithms are implemented for Bernoulli bandit in <a …
arXiv cs.LG TIER_1 English(EN) · Maresa Schr\"oder, Pascal Janetzky, Michael Klar, Stefan Feuerriegel · 2026-05-08 04:00

ORTHOBO: Orthogonal Bayesian Hyperparameter Optimization

arXiv:2605.06454v1 Announce Type: new Abstract: Bayesian optimization is widely used for hyperparameter optimization when model evaluations are expensive; however, noisy acquisition estimates can lead to unstable decisions. We identify acquisition estimation noise as a failure mo…
arXiv cs.LG TIER_1 English(EN) · Becky Langdon, Gabriel D. Patr\'on, Chrysoula D. Kappatou, Robert M. Lee, Behrang Shafei, Jixiang Qing, Ruth Misener, Mark van der Wilk, Calvin Tsay · 2026-05-08 04:00

Meta-learning for sample-efficient Bayesian optimisation of fed-batch processes

arXiv:2605.05382v1 Announce Type: cross Abstract: The optimisation of fed-batch (bio)chemical process recipes is subject to inherent, underlying, and unmeasurable fluctuations across batches, whose trajectories are difficult to model and costly to measure. Bayesian Optimisation (…
arXiv cs.AI TIER_1 English(EN) · Stefan Feuerriegel · 2026-05-07 15:49

ORTHOBO: Orthogonal Bayesian Hyperparameter Optimization

Bayesian optimization is widely used for hyperparameter optimization when model evaluations are expensive; however, noisy acquisition estimates can lead to unstable decisions. We identify acquisition estimation noise as a failure mode that was previously overlooked: even when the…
arXiv cs.LG TIER_1 English(EN) · Noel Thomas · 2026-05-07 04:00

Regime-Conditioned Evaluation in Multi-Context Bayesian Optimization

arXiv:2605.04895v1 Announce Type: new Abstract: Published transfer-BO comparisons often estimate an average treatment effect of acquisition choice over hidden regime variables, while practitioners need the conditional effect for their specific prior quality, budget ratio, and met…
arXiv cs.LG TIER_1 English(EN) · Sofianos Panagiotis Fotias, Vassilis Gaganis · 2026-05-05 04:00

Inducing Permutation Invariant Priors in Bayesian Optimization for Carbon Capture and Storage Applications

arXiv:2605.02409v1 Announce Type: new Abstract: Bayesian Optimization is an iterative method, tailored to optimizing expensive black box objective functions. Surrogate models like Gaussian Processes, which are the gold standard in Bayesian Optimization, can be inefficient for inp…
arXiv cs.LG TIER_1 English(EN) · Sofianos Panagiotis Fotias, Vassilis Gaganis · 2026-05-05 04:00

Bayesian Neural Network Surrogates for Bayesian Optimization of Carbon Capture and Storage Operations

arXiv:2507.21803v2 Announce Type: replace Abstract: Carbon Capture and Storage (CCS) stands as a pivotal technology for fostering a sustainable future. The process, which involves injecting supercritical CO$_2$ into underground formations, a method already widely used for Enhance…
arXiv cs.LG TIER_1 English(EN) · Vassilis Gaganis · 2026-05-04 09:55

Inducing Permutation Invariant Priors in Bayesian Optimization for Carbon Capture and Storage Applications

Bayesian Optimization is an iterative method, tailored to optimizing expensive black box objective functions. Surrogate models like Gaussian Processes, which are the gold standard in Bayesian Optimization, can be inefficient for inputs with permutation symmetries, as the most com…
arXiv cs.LG TIER_1 English(EN) · Jesse Schneider, William J. Welch · 2026-05-04 04:00

Bayesian Optimization in Linear Time

arXiv:2605.00237v1 Announce Type: new Abstract: Bayesian optimization is a sequential method for minimizing objective functions that are expensive to evaluate and about which few assumptions can be made. By using all gathered data to train a Gaussian process model for the functio…
arXiv cs.AI TIER_1 Deutsch(DE) · Ziwei Zhang, Jonathan Yu-Meng Li · 2026-05-01 04:00

Sampler-Robust Optimization under Generative Models

arXiv:2604.27447v1 Announce Type: cross Abstract: Modern stochastic optimization pipelines increasingly rely on learned generative models to represent uncertainty, while downstream decisions are evaluated almost entirely through Monte Carlo scenarios. This shifts the operational …
Hugging Face Daily Papers TIER_1 Deutsch(DE) · 2026-04-30 05:33

Sampler-Robust Optimization under Generative Models

Modern stochastic optimization pipelines increasingly rely on learned generative models to represent uncertainty, while downstream decisions are evaluated almost entirely through Monte Carlo scenarios. This shifts the operational object of uncertainty from an explicit probability…
arXiv cs.LG TIER_1 English(EN) · Zean Han, Ruihan Lin, Zezhen Ding, Jiheng Zhang · 2026-04-28 04:00

Geometry-Aware Offline-to-Online Learning in Linear Contextual Bandits

arXiv:2604.24016v1 Announce Type: new Abstract: We study offline-to-online learning in linear contextual bandits with biased offline regression data: the offline parameter need not match the online one, so history should not be treated as a single warm start. We model directional…
arXiv cs.LG TIER_1 English(EN) · Sourav Chakraborty, Amit Kiran Rege, Claire Monteleoni, Lijun Chen · 2026-04-28 04:00

Flickering Multi-Armed Bandits

arXiv:2602.17315v2 Announce Type: replace Abstract: We introduce Flickering Multi-Armed Bandits (FMAB) to model sequential decision-making in environments with changing action availability, where accessibility of the next action is restricted to a subset dependent on the agent's …
Hugging Face Daily Papers TIER_1 English(EN) · 2026-04-27 22:05

A Finite Time Analysis of Thompson Sampling for Bayesian Optimization with Preferential Feedback

Preference feedback, in the form of pairwise comparisons rather than scalar scores, has seen increasing use in applications such as human-, laboratory-, and expert-in-the-loop design, as well as scientific discovery. We propose a Thompson Sampling (TS) approach to Bayesian optimi…
arXiv cs.LG TIER_1 English(EN) · Seoungbin Bae, Dabeen Lee · 2026-04-27 04:00

Logistic Bandits with $\tilde{O}(\sqrt{dT})$ Regret without Context Diversity Assumptions

arXiv:2604.22161v1 Announce Type: new Abstract: We study the $K$-armed logistic bandit problem, where at each round, the agent observes $K$ feature vectors associated with $K$ actions. Existing approaches that achieve a rate-optimal $\tilde{\mathcal{O}}(\sqrt{dT})$ regret bound r…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-04-27 03:53

Geometry-Aware Offline-to-Online Learning in Linear Contextual Bandits

We study offline-to-online learning in linear contextual bandits with biased offline regression data: the offline parameter need not match the online one, so history should not be treated as a single warm start. We model directional transfer with a shift certificate $(M_{\mathrm{…
arXiv cs.LG TIER_1 English(EN) · Dabeen Lee · 2026-04-24 02:21

Logistic Bandits with $\tilde{O}(\sqrt{dT})$ Regret without Context Diversity Assumptions

We study the $K$-armed logistic bandit problem, where at each round, the agent observes $K$ feature vectors associated with $K$ actions. Existing approaches that achieve a rate-optimal $\tilde{\mathcal{O}}(\sqrt{dT})$ regret bound rely heavily on context diversity assumptions, su…
arXiv stat.ML TIER_1 English(EN) · Noel Thomas · 2026-05-06 13:27

Regime-Conditioned Evaluation in Multi-Context Bayesian Optimization

Published transfer-BO comparisons often estimate an average treatment effect of acquisition choice over hidden regime variables, while practitioners need the conditional effect for their specific prior quality, budget ratio, and metric. An audit of 40 transfer-BO papers from Neur…
arXiv stat.ML TIER_1 English(EN) · Abhirami Pillai · 2026-04-30 04:00

Budget-Constrained Causal Bandits: Bridging Uplift Modeling and Sequential Decision-Making

arXiv:2604.26169v1 Announce Type: cross Abstract: Treatment allocation under budget constraints is a central challenge in digital advertising: advertisers must decide which users to show ads to while spending a limited budget wisely. The standard approach follows a two-stage offl…
arXiv stat.ML TIER_1 English(EN) · Gergely Neu, Michal Valko · 2026-04-29 04:00

Online combinatorial optimization with stochastic decision sets and adversarial losses

arXiv:2604.25269v1 Announce Type: cross Abstract: Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments…
arXiv stat.ML TIER_1 English(EN) · Tom\'a\v{s} Koc\'ak, R\'emi Munos, Branislav Kveton, Shipra Agrawal, Michal Valko · 2026-04-29 04:00

Spectral bandits

arXiv:2604.25272v1 Announce Type: new Abstract: Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this work, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learn…
arXiv stat.ML TIER_1 English(EN) · Abhirami Pillai · 2026-04-28 23:24

Budget-Constrained Causal Bandits: Bridging Uplift Modeling and Sequential Decision-Making

Treatment allocation under budget constraints is a central challenge in digital advertising: advertisers must decide which users to show ads to while spending a limited budget wisely. The standard approach follows a two-stage offline pipeline - first collect historical data to es…
arXiv stat.ML TIER_1 English(EN) · Michal Valko · 2026-04-28 06:29

Spectral bandits

Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this work, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as conten…
arXiv stat.ML TIER_1 English(EN) · Michal Valko · 2026-04-28 06:28

Online combinatorial optimization with stochastic decision sets and adversarial losses

Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can be blocked or goods that are out of stoc…
arXiv stat.ML TIER_1 English(EN) · Tomas Kocak, Gergely Neu, Michal Valko, Remi Munos · 2026-04-28 04:00

Efficient learning by implicit exploration in bandit problems with side observations

arXiv:2604.24555v1 Announce Type: cross Abstract: We consider online learning problems under a partial observability model capturing situations where the information conveyed to the learner is between full information and bandit feedback. In the simplest variant, we assume that i…
arXiv stat.ML TIER_1 English(EN) · Sahasrajit Sarmasarkar · 2026-04-28 04:00

Shuffle and Joint Differential Privacy for Generalized Linear Contextual Bandits

arXiv:2602.00417v2 Announce Type: replace Abstract: We present the first algorithms for generalized linear contextual bandits under shuffle differential privacy and joint differential privacy. While prior work on private contextual bandits has been restricted to linear reward mod…
arXiv stat.ML TIER_1 English(EN) · Remi Munos · 2026-04-27 14:45

Efficient learning by implicit exploration in bandit problems with side observations

We consider online learning problems under a partial observability model capturing situations where the information conveyed to the learner is between full information and bandit feedback. In the simplest variant, we assume that in addition to its own loss, the learner also gets …
arXiv stat.ML TIER_1 English(EN) · Jierui Zuo, Hanzhang Qin · 2026-04-27 04:00

On Pareto Optimality for Parametric Choice Bandits

arXiv:2501.19277v4 Announce Type: replace Abstract: We study online assortment optimization under stochastic choice when a decision maker simultaneously values cumulative revenue performance and the quality of post-hoc inference on revenue contrasts. We analyze a forced-explorati…
arXiv stat.ML TIER_1 English(EN) · Mat\'ias Carrasco, Alejandro Cholaquidis · 2026-04-27 04:00

Concave Statistical Utility Maximization Bandits via Influence-Function Gradients

arXiv:2604.22140v1 Announce Type: new Abstract: We study stochastic multi-armed bandits in which the objective is a statistical functional of the long-run reward distribution, rather than expected reward alone. Under mild continuity assumptions, we show that the infinite-horizon …
arXiv stat.ML TIER_1 English(EN) · Alejandro Cholaquidis · 2026-04-24 01:13

Concave Statistical Utility Maximization Bandits via Influence-Function Gradients

We study stochastic multi-armed bandits in which the objective is a statistical functional of the long-run reward distribution, rather than expected reward alone. Under mild continuity assumptions, we show that the infinite-horizon problem reduces to optimizing over stationary mi…
arXiv stat.ML TIER_1 English(EN) · Alejandro Cholaquidis · 2026-04-24 01:13

Concave Statistical Utility Maximization Bandits via Influence-Function Gradients

We study stochastic multi-armed bandits in which the objective is a statistical functional of the long-run reward distribution, rather than expected reward alone. Under mild continuity assumptions, we show that the infinite-horizon problem reduces to optimizing over stationary mi…
arXiv stat.ML TIER_1 English(EN) · Michal Valko · 2026-04-23 08:48

A single algorithm for both restless and rested rotting bandits

In many application domains (e.g., recommender systems, intelligent tutoring systems), the rewards associated to the actions tend to decrease over time. This decay is either caused by the actions executed in the past (e.g., a user may get bored when songs of the same genre are re…
Eugene Yan TIER_1 English(EN) · 2022-05-08 00:00

Bandits for Recommender Systems

Industry examples, exploration strategies, warm-starting, off-policy evaluation, and more.

COVERAGE [33]

RELATED ENTITIES

RELATED TOPICS