PulseAugur
实时 15:17:11
English(EN) The Multi-Armed Bandit Problem and Its Solutions

研究人员推进贝叶斯优化以实现高效决策和超参数调整

近期几篇arXiv论文探讨了多臂老虎机问题的进展,这是一个在不确定性下进行序贯决策的框架。研究内容包括处理“闪烁多臂老虎机”中变化的动作可用性,以及在不严格假设上下文多样性的情况下改进逻辑老虎机的遗憾界限。其他工作则侧重于几何感知离线到在线学习、图上平滑函数的谱老虎机,以及广义线性上下文老虎机的隐私保护算法。 AI

影响 老虎机算法的进步可能导致更高效的在线学习系统,并改进推荐、广告和资源分配中的决策。

排序理由 多篇arXiv论文发表了关于多臂老虎机算法的各种理论进展。

在 Lil'Log (Lilian Weng) 阅读 →

AI 生成摘要 · Google Gemini · 来自 33 个来源。 我们如何撰写摘要 →

研究人员推进贝叶斯优化以实现高效决策和超参数调整

报道来源 [33]

  1. Lil'Log (Lilian Weng) TIER_1 English(EN) ·

    The Multi-Armed Bandit Problem and Its Solutions

    <!-- The multi-armed bandit problem is a class example to demonstrate the exploration versus exploitation dilemma. This post introduces the bandit problem and how to solve it using different exploration strategies. --> <p>The algorithms are implemented for Bernoulli bandit in <a …

  2. arXiv cs.LG TIER_1 English(EN) · Maresa Schr\"oder, Pascal Janetzky, Michael Klar, Stefan Feuerriegel ·

    ORTHOBO: Orthogonal Bayesian Hyperparameter Optimization

    arXiv:2605.06454v1 Announce Type: new Abstract: Bayesian optimization is widely used for hyperparameter optimization when model evaluations are expensive; however, noisy acquisition estimates can lead to unstable decisions. We identify acquisition estimation noise as a failure mo…

  3. arXiv cs.LG TIER_1 English(EN) · Becky Langdon, Gabriel D. Patr\'on, Chrysoula D. Kappatou, Robert M. Lee, Behrang Shafei, Jixiang Qing, Ruth Misener, Mark van der Wilk, Calvin Tsay ·

    Meta-learning for sample-efficient Bayesian optimisation of fed-batch processes

    arXiv:2605.05382v1 Announce Type: cross Abstract: The optimisation of fed-batch (bio)chemical process recipes is subject to inherent, underlying, and unmeasurable fluctuations across batches, whose trajectories are difficult to model and costly to measure. Bayesian Optimisation (…

  4. arXiv cs.AI TIER_1 English(EN) · Stefan Feuerriegel ·

    ORTHOBO: Orthogonal Bayesian Hyperparameter Optimization

    Bayesian optimization is widely used for hyperparameter optimization when model evaluations are expensive; however, noisy acquisition estimates can lead to unstable decisions. We identify acquisition estimation noise as a failure mode that was previously overlooked: even when the…

  5. arXiv cs.LG TIER_1 English(EN) · Noel Thomas ·

    Regime-Conditioned Evaluation in Multi-Context Bayesian Optimization

    arXiv:2605.04895v1 Announce Type: new Abstract: Published transfer-BO comparisons often estimate an average treatment effect of acquisition choice over hidden regime variables, while practitioners need the conditional effect for their specific prior quality, budget ratio, and met…

  6. arXiv cs.LG TIER_1 English(EN) · Sofianos Panagiotis Fotias, Vassilis Gaganis ·

    Inducing Permutation Invariant Priors in Bayesian Optimization for Carbon Capture and Storage Applications

    arXiv:2605.02409v1 Announce Type: new Abstract: Bayesian Optimization is an iterative method, tailored to optimizing expensive black box objective functions. Surrogate models like Gaussian Processes, which are the gold standard in Bayesian Optimization, can be inefficient for inp…

  7. arXiv cs.LG TIER_1 English(EN) · Sofianos Panagiotis Fotias, Vassilis Gaganis ·

    Bayesian Neural Network Surrogates for Bayesian Optimization of Carbon Capture and Storage Operations

    arXiv:2507.21803v2 Announce Type: replace Abstract: Carbon Capture and Storage (CCS) stands as a pivotal technology for fostering a sustainable future. The process, which involves injecting supercritical CO$_2$ into underground formations, a method already widely used for Enhance…

  8. arXiv cs.LG TIER_1 English(EN) · Vassilis Gaganis ·

    Inducing Permutation Invariant Priors in Bayesian Optimization for Carbon Capture and Storage Applications

    Bayesian Optimization is an iterative method, tailored to optimizing expensive black box objective functions. Surrogate models like Gaussian Processes, which are the gold standard in Bayesian Optimization, can be inefficient for inputs with permutation symmetries, as the most com…

  9. arXiv cs.LG TIER_1 English(EN) · Jesse Schneider, William J. Welch ·

    Bayesian Optimization in Linear Time

    arXiv:2605.00237v1 Announce Type: new Abstract: Bayesian optimization is a sequential method for minimizing objective functions that are expensive to evaluate and about which few assumptions can be made. By using all gathered data to train a Gaussian process model for the functio…

  10. arXiv cs.AI TIER_1 Deutsch(DE) · Ziwei Zhang, Jonathan Yu-Meng Li ·

    Sampler-Robust Optimization under Generative Models

    arXiv:2604.27447v1 Announce Type: cross Abstract: Modern stochastic optimization pipelines increasingly rely on learned generative models to represent uncertainty, while downstream decisions are evaluated almost entirely through Monte Carlo scenarios. This shifts the operational …

  11. Hugging Face Daily Papers TIER_1 Deutsch(DE) ·

    Sampler-Robust Optimization under Generative Models

    Modern stochastic optimization pipelines increasingly rely on learned generative models to represent uncertainty, while downstream decisions are evaluated almost entirely through Monte Carlo scenarios. This shifts the operational object of uncertainty from an explicit probability…

  12. arXiv cs.LG TIER_1 English(EN) · Zean Han, Ruihan Lin, Zezhen Ding, Jiheng Zhang ·

    Geometry-Aware Offline-to-Online Learning in Linear Contextual Bandits

    arXiv:2604.24016v1 Announce Type: new Abstract: We study offline-to-online learning in linear contextual bandits with biased offline regression data: the offline parameter need not match the online one, so history should not be treated as a single warm start. We model directional…

  13. arXiv cs.LG TIER_1 English(EN) · Sourav Chakraborty, Amit Kiran Rege, Claire Monteleoni, Lijun Chen ·

    Flickering Multi-Armed Bandits

    arXiv:2602.17315v2 Announce Type: replace Abstract: We introduce Flickering Multi-Armed Bandits (FMAB) to model sequential decision-making in environments with changing action availability, where accessibility of the next action is restricted to a subset dependent on the agent's …

  14. Hugging Face Daily Papers TIER_1 English(EN) ·

    A Finite Time Analysis of Thompson Sampling for Bayesian Optimization with Preferential Feedback

    Preference feedback, in the form of pairwise comparisons rather than scalar scores, has seen increasing use in applications such as human-, laboratory-, and expert-in-the-loop design, as well as scientific discovery. We propose a Thompson Sampling (TS) approach to Bayesian optimi…

  15. arXiv cs.LG TIER_1 English(EN) · Seoungbin Bae, Dabeen Lee ·

    Logistic Bandits with $\tilde{O}(\sqrt{dT})$ Regret without Context Diversity Assumptions

    arXiv:2604.22161v1 Announce Type: new Abstract: We study the $K$-armed logistic bandit problem, where at each round, the agent observes $K$ feature vectors associated with $K$ actions. Existing approaches that achieve a rate-optimal $\tilde{\mathcal{O}}(\sqrt{dT})$ regret bound r…

  16. Hugging Face Daily Papers TIER_1 English(EN) ·

    Geometry-Aware Offline-to-Online Learning in Linear Contextual Bandits

    We study offline-to-online learning in linear contextual bandits with biased offline regression data: the offline parameter need not match the online one, so history should not be treated as a single warm start. We model directional transfer with a shift certificate $(M_{\mathrm{…

  17. arXiv cs.LG TIER_1 English(EN) · Dabeen Lee ·

    Logistic Bandits with $\tilde{O}(\sqrt{dT})$ Regret without Context Diversity Assumptions

    We study the $K$-armed logistic bandit problem, where at each round, the agent observes $K$ feature vectors associated with $K$ actions. Existing approaches that achieve a rate-optimal $\tilde{\mathcal{O}}(\sqrt{dT})$ regret bound rely heavily on context diversity assumptions, su…

  18. arXiv stat.ML TIER_1 English(EN) · Noel Thomas ·

    Regime-Conditioned Evaluation in Multi-Context Bayesian Optimization

    Published transfer-BO comparisons often estimate an average treatment effect of acquisition choice over hidden regime variables, while practitioners need the conditional effect for their specific prior quality, budget ratio, and metric. An audit of 40 transfer-BO papers from Neur…

  19. arXiv stat.ML TIER_1 English(EN) · Abhirami Pillai ·

    Budget-Constrained Causal Bandits: Bridging Uplift Modeling and Sequential Decision-Making

    arXiv:2604.26169v1 Announce Type: cross Abstract: Treatment allocation under budget constraints is a central challenge in digital advertising: advertisers must decide which users to show ads to while spending a limited budget wisely. The standard approach follows a two-stage offl…

  20. arXiv stat.ML TIER_1 English(EN) · Gergely Neu, Michal Valko ·

    Online combinatorial optimization with stochastic decision sets and adversarial losses

    arXiv:2604.25269v1 Announce Type: cross Abstract: Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments…

  21. arXiv stat.ML TIER_1 English(EN) · Tom\'a\v{s} Koc\'ak, R\'emi Munos, Branislav Kveton, Shipra Agrawal, Michal Valko ·

    Spectral bandits

    arXiv:2604.25272v1 Announce Type: new Abstract: Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this work, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learn…

  22. arXiv stat.ML TIER_1 English(EN) · Abhirami Pillai ·

    Budget-Constrained Causal Bandits: Bridging Uplift Modeling and Sequential Decision-Making

    Treatment allocation under budget constraints is a central challenge in digital advertising: advertisers must decide which users to show ads to while spending a limited budget wisely. The standard approach follows a two-stage offline pipeline - first collect historical data to es…

  23. arXiv stat.ML TIER_1 English(EN) · Michal Valko ·

    Spectral bandits

    Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this work, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as conten…

  24. arXiv stat.ML TIER_1 English(EN) · Michal Valko ·

    Online combinatorial optimization with stochastic decision sets and adversarial losses

    Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can be blocked or goods that are out of stoc…

  25. arXiv stat.ML TIER_1 English(EN) · Tomas Kocak, Gergely Neu, Michal Valko, Remi Munos ·

    Efficient learning by implicit exploration in bandit problems with side observations

    arXiv:2604.24555v1 Announce Type: cross Abstract: We consider online learning problems under a partial observability model capturing situations where the information conveyed to the learner is between full information and bandit feedback. In the simplest variant, we assume that i…

  26. arXiv stat.ML TIER_1 English(EN) · Sahasrajit Sarmasarkar ·

    Shuffle and Joint Differential Privacy for Generalized Linear Contextual Bandits

    arXiv:2602.00417v2 Announce Type: replace Abstract: We present the first algorithms for generalized linear contextual bandits under shuffle differential privacy and joint differential privacy. While prior work on private contextual bandits has been restricted to linear reward mod…

  27. arXiv stat.ML TIER_1 English(EN) · Remi Munos ·

    Efficient learning by implicit exploration in bandit problems with side observations

    We consider online learning problems under a partial observability model capturing situations where the information conveyed to the learner is between full information and bandit feedback. In the simplest variant, we assume that in addition to its own loss, the learner also gets …

  28. arXiv stat.ML TIER_1 English(EN) · Jierui Zuo, Hanzhang Qin ·

    On Pareto Optimality for Parametric Choice Bandits

    arXiv:2501.19277v4 Announce Type: replace Abstract: We study online assortment optimization under stochastic choice when a decision maker simultaneously values cumulative revenue performance and the quality of post-hoc inference on revenue contrasts. We analyze a forced-explorati…

  29. arXiv stat.ML TIER_1 English(EN) · Mat\'ias Carrasco, Alejandro Cholaquidis ·

    Concave Statistical Utility Maximization Bandits via Influence-Function Gradients

    arXiv:2604.22140v1 Announce Type: new Abstract: We study stochastic multi-armed bandits in which the objective is a statistical functional of the long-run reward distribution, rather than expected reward alone. Under mild continuity assumptions, we show that the infinite-horizon …

  30. arXiv stat.ML TIER_1 English(EN) · Alejandro Cholaquidis ·

    Concave Statistical Utility Maximization Bandits via Influence-Function Gradients

    We study stochastic multi-armed bandits in which the objective is a statistical functional of the long-run reward distribution, rather than expected reward alone. Under mild continuity assumptions, we show that the infinite-horizon problem reduces to optimizing over stationary mi…

  31. arXiv stat.ML TIER_1 English(EN) · Alejandro Cholaquidis ·

    Concave Statistical Utility Maximization Bandits via Influence-Function Gradients

    We study stochastic multi-armed bandits in which the objective is a statistical functional of the long-run reward distribution, rather than expected reward alone. Under mild continuity assumptions, we show that the infinite-horizon problem reduces to optimizing over stationary mi…

  32. arXiv stat.ML TIER_1 English(EN) · Michal Valko ·

    A single algorithm for both restless and rested rotting bandits

    In many application domains (e.g., recommender systems, intelligent tutoring systems), the rewards associated to the actions tend to decrease over time. This decay is either caused by the actions executed in the past (e.g., a user may get bored when songs of the same genre are re…

  33. Eugene Yan TIER_1 English(EN) ·

    Bandits for Recommender Systems

    Industry examples, exploration strategies, warm-starting, off-policy evaluation, and more.