Researchers advance Bayesian Optimization for efficient decision-making and hyperparameter tuning
ByPulseAugur Editorial·
Summary by gemini-2.5-flash-lite
from 33 sources
Several recent arXiv papers explore advancements in multi-armed bandit problems, a framework for sequential decision-making under uncertainty. Research includes handling changing action availability with "Flickering Multi-Armed Bandits" and improving regret bounds in logistic bandits without strict context diversity assumptions. Other work focuses on geometry-aware offline-to-online learning, spectral bandits for smooth functions on graphs, and privacy-preserving algorithms for generalized linear contextual bandits.
AI
IMPACT
Advances in bandit algorithms could lead to more efficient online learning systems and improved decision-making in recommendation, advertising, and resource allocation.
RANK_REASON
Multiple arXiv papers published on various theoretical advancements in multi-armed bandit algorithms.
<!-- The multi-armed bandit problem is a class example to demonstrate the exploration versus exploitation dilemma. This post introduces the bandit problem and how to solve it using different exploration strategies. --> <p>The algorithms are implemented for Bernoulli bandit in <a …
arXiv cs.LG
TIER_1·Maresa Schr\"oder, Pascal Janetzky, Michael Klar, Stefan Feuerriegel·
arXiv:2605.06454v1 Announce Type: new Abstract: Bayesian optimization is widely used for hyperparameter optimization when model evaluations are expensive; however, noisy acquisition estimates can lead to unstable decisions. We identify acquisition estimation noise as a failure mo…
arXiv cs.LG
TIER_1·Becky Langdon, Gabriel D. Patr\'on, Chrysoula D. Kappatou, Robert M. Lee, Behrang Shafei, Jixiang Qing, Ruth Misener, Mark van der Wilk, Calvin Tsay·
arXiv:2605.05382v1 Announce Type: cross Abstract: The optimisation of fed-batch (bio)chemical process recipes is subject to inherent, underlying, and unmeasurable fluctuations across batches, whose trajectories are difficult to model and costly to measure. Bayesian Optimisation (…
Bayesian optimization is widely used for hyperparameter optimization when model evaluations are expensive; however, noisy acquisition estimates can lead to unstable decisions. We identify acquisition estimation noise as a failure mode that was previously overlooked: even when the…
arXiv:2605.04895v1 Announce Type: new Abstract: Published transfer-BO comparisons often estimate an average treatment effect of acquisition choice over hidden regime variables, while practitioners need the conditional effect for their specific prior quality, budget ratio, and met…
arXiv:2605.02409v1 Announce Type: new Abstract: Bayesian Optimization is an iterative method, tailored to optimizing expensive black box objective functions. Surrogate models like Gaussian Processes, which are the gold standard in Bayesian Optimization, can be inefficient for inp…
arXiv:2507.21803v2 Announce Type: replace Abstract: Carbon Capture and Storage (CCS) stands as a pivotal technology for fostering a sustainable future. The process, which involves injecting supercritical CO$_2$ into underground formations, a method already widely used for Enhance…
Bayesian Optimization is an iterative method, tailored to optimizing expensive black box objective functions. Surrogate models like Gaussian Processes, which are the gold standard in Bayesian Optimization, can be inefficient for inputs with permutation symmetries, as the most com…
arXiv cs.LG
TIER_1·Jesse Schneider, William J. Welch·
arXiv:2605.00237v1 Announce Type: new Abstract: Bayesian optimization is a sequential method for minimizing objective functions that are expensive to evaluate and about which few assumptions can be made. By using all gathered data to train a Gaussian process model for the functio…
arXiv cs.AI
TIER_1Deutsch(DE)·Ziwei Zhang, Jonathan Yu-Meng Li·
arXiv:2604.27447v1 Announce Type: cross Abstract: Modern stochastic optimization pipelines increasingly rely on learned generative models to represent uncertainty, while downstream decisions are evaluated almost entirely through Monte Carlo scenarios. This shifts the operational …
Modern stochastic optimization pipelines increasingly rely on learned generative models to represent uncertainty, while downstream decisions are evaluated almost entirely through Monte Carlo scenarios. This shifts the operational object of uncertainty from an explicit probability…
arXiv:2604.24016v1 Announce Type: new Abstract: We study offline-to-online learning in linear contextual bandits with biased offline regression data: the offline parameter need not match the online one, so history should not be treated as a single warm start. We model directional…
arXiv:2602.17315v2 Announce Type: replace Abstract: We introduce Flickering Multi-Armed Bandits (FMAB) to model sequential decision-making in environments with changing action availability, where accessibility of the next action is restricted to a subset dependent on the agent's …
Preference feedback, in the form of pairwise comparisons rather than scalar scores, has seen increasing use in applications such as human-, laboratory-, and expert-in-the-loop design, as well as scientific discovery. We propose a Thompson Sampling (TS) approach to Bayesian optimi…
arXiv:2604.22161v1 Announce Type: new Abstract: We study the $K$-armed logistic bandit problem, where at each round, the agent observes $K$ feature vectors associated with $K$ actions. Existing approaches that achieve a rate-optimal $\tilde{\mathcal{O}}(\sqrt{dT})$ regret bound r…
We study offline-to-online learning in linear contextual bandits with biased offline regression data: the offline parameter need not match the online one, so history should not be treated as a single warm start. We model directional transfer with a shift certificate $(M_{\mathrm{…
We study the $K$-armed logistic bandit problem, where at each round, the agent observes $K$ feature vectors associated with $K$ actions. Existing approaches that achieve a rate-optimal $\tilde{\mathcal{O}}(\sqrt{dT})$ regret bound rely heavily on context diversity assumptions, su…
Published transfer-BO comparisons often estimate an average treatment effect of acquisition choice over hidden regime variables, while practitioners need the conditional effect for their specific prior quality, budget ratio, and metric. An audit of 40 transfer-BO papers from Neur…
arXiv:2604.26169v1 Announce Type: cross Abstract: Treatment allocation under budget constraints is a central challenge in digital advertising: advertisers must decide which users to show ads to while spending a limited budget wisely. The standard approach follows a two-stage offl…
arXiv:2604.25269v1 Announce Type: cross Abstract: Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments…
arXiv:2604.25272v1 Announce Type: new Abstract: Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this work, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learn…
Treatment allocation under budget constraints is a central challenge in digital advertising: advertisers must decide which users to show ads to while spending a limited budget wisely. The standard approach follows a two-stage offline pipeline - first collect historical data to es…
Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this work, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as conten…
Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can be blocked or goods that are out of stoc…
arXiv stat.ML
TIER_1·Tomas Kocak, Gergely Neu, Michal Valko, Remi Munos·
arXiv:2604.24555v1 Announce Type: cross Abstract: We consider online learning problems under a partial observability model capturing situations where the information conveyed to the learner is between full information and bandit feedback. In the simplest variant, we assume that i…
arXiv:2602.00417v2 Announce Type: replace Abstract: We present the first algorithms for generalized linear contextual bandits under shuffle differential privacy and joint differential privacy. While prior work on private contextual bandits has been restricted to linear reward mod…
We consider online learning problems under a partial observability model capturing situations where the information conveyed to the learner is between full information and bandit feedback. In the simplest variant, we assume that in addition to its own loss, the learner also gets …
arXiv:2501.19277v4 Announce Type: replace Abstract: We study online assortment optimization under stochastic choice when a decision maker simultaneously values cumulative revenue performance and the quality of post-hoc inference on revenue contrasts. We analyze a forced-explorati…
arXiv:2604.22140v1 Announce Type: new Abstract: We study stochastic multi-armed bandits in which the objective is a statistical functional of the long-run reward distribution, rather than expected reward alone. Under mild continuity assumptions, we show that the infinite-horizon …
We study stochastic multi-armed bandits in which the objective is a statistical functional of the long-run reward distribution, rather than expected reward alone. Under mild continuity assumptions, we show that the infinite-horizon problem reduces to optimizing over stationary mi…
We study stochastic multi-armed bandits in which the objective is a statistical functional of the long-run reward distribution, rather than expected reward alone. Under mild continuity assumptions, we show that the infinite-horizon problem reduces to optimizing over stationary mi…
In many application domains (e.g., recommender systems, intelligent tutoring systems), the rewards associated to the actions tend to decrease over time. This decay is either caused by the actions executed in the past (e.g., a user may get bored when songs of the same genre are re…