New bandit algorithms tackle replicability, kernel complexity, and streaming data

By PulseAugur Editorial · [14 sources] · 2026-06-08 03:58

Researchers have published several new papers detailing advancements in multi-armed bandit algorithms. One study introduces replicable UCB-based exploration methods for stochastic and linear bandits, improving regret guarantees. Another paper unifies Gaussian-process UCB and decision-estimation-coefficient methods for kernel bandits, highlighting the distinction between algorithmic information and minimax complexity. Additionally, new algorithms address sliding-window streaming bandits with limited memory and contextual queueing bandits, achieving improved regret rates and characterizing minimax dependencies. AI

IMPACT Advances in bandit algorithms can lead to more efficient online learning systems for recommendation engines, resource allocation, and experimentation platforms.

RANK_REASON Multiple academic papers published on arXiv detailing new algorithms and theoretical analyses in the field of multi-armed bandits.

Read on arXiv cs.LG →

paper
other

AI-generated summary · Google Gemini · from 14 sources. How we write summaries →

COVERAGE [14]

arXiv cs.LG TIER_1 English(EN) · Jos\'e Ni\~no-Mora · 2026-06-11 04:00

Restless bandits with imperfect binary feedback: PCL-indexability analysis and computation

arXiv:2606.11192v1 Announce Type: new Abstract: We study restless bandits with binary latent states and imperfect binary feedback, motivated by opportunistic spectrum access with sensing errors. For the associated belief-state model, we develop a partial conservation laws (PCL)-b…
arXiv cs.LG TIER_1 English(EN) · Rohan Deb, Udaya Ghai, Karan Singh, Arindam Banerjee · 2026-06-10 04:00

Replicable Bandits with UCB based Exploration

arXiv:2604.20024v2 Announce Type: replace Abstract: We study replicable algorithms for stochastic multi-armed bandits (MAB) and linear bandits with UCB (Upper Confidence Bound) based exploration. A bandit algorithm is $\rho$-replicable if two executions using shared internal rand…
arXiv cs.LG TIER_1 English(EN) · Yunbei Xu · 2026-06-10 04:00

Algorithmic and Minimax Complexities in Kernel Bandits

arXiv:2606.11171v1 Announce Type: new Abstract: Gaussian-process upper confidence bound (GP-UCB) and decision-estimation-coefficient (DEC) methods may appear, at first sight, to belong to different theories. This paper places the two viewpoints in a common algorithmic-information…
arXiv cs.LG TIER_1 English(EN) · Yunbei Xu · 2026-06-09 17:49

Algorithmic and Minimax Complexities in Kernel Bandits

Gaussian-process upper confidence bound (GP-UCB) and decision-estimation-coefficient (DEC) methods may appear, at first sight, to belong to different theories. This paper places the two viewpoints in a common algorithmic-information language for frequentist RKHS bandits. GP-UCB f…
arXiv cs.LG TIER_1 English(EN) · Seoungbin Bae, Dabeen Lee · 2026-06-09 04:00

Algorithm for Contextual Queueing Bandits with Rate-Optimal Queue Length Regret

arXiv:2606.09668v1 Announce Type: new Abstract: Contextual queueing bandits provide a framework for learning to schedule heterogeneous jobs under unknown context-dependent service rates. Under stochastic contexts, existing algorithms achieve $\widetilde{\mathcal{O}}(T^{-1/4})$ qu…
arXiv cs.LG TIER_1 English(EN) · Vladimir Braverman, Chen Wang, Liudeng Wang, Samson Zhou · 2026-06-09 04:00

Online Learning with Recency: Algorithms for Sliding-window Streaming Multi-armed Bandits

arXiv:2606.08977v1 Announce Type: new Abstract: Motivated by the recency effect in online learning, we study algorithms for single-pass *sliding-window streaming multi-armed bandits (MABs)* in this paper. In this setting, we are given $n$ arms with unknown sub-Gaussian reward dis…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-08 17:53

Bandits for Efficient Experimentation: Adapting to Control Group, Preferences, and Context Drifts

We consider a variant of the linear contextual stochastic multi-armed bandits, where the learner must provide recommendations to a group of users, each having its personalized preference vector, and in the presence of context distributions that are drifting over time. Under pract…
arXiv cs.LG TIER_1 English(EN) · Dabeen Lee · 2026-06-08 15:51

Algorithm for Contextual Queueing Bandits with Rate-Optimal Queue Length Regret

Contextual queueing bandits provide a framework for learning to schedule heterogeneous jobs under unknown context-dependent service rates. Under stochastic contexts, existing algorithms achieve $\widetilde{\mathcal{O}}(T^{-1/4})$ queue length regret, defined as the expected diffe…
arXiv stat.ML TIER_1 (CA) · Linzhe He, Yu-Jie Zhang, Sifan Yang, Lijun Zhang · 2026-06-11 04:00

Efficient Multinomial Logistic Bandit via Frequent Directions

arXiv:2606.11968v1 Announce Type: cross Abstract: This paper studies efficient online algorithms for multinomial logistic bandits (MLogB), where the feedback distribution over $K+1$ outcomes follows a multinomial logistic model of $d$-dimensional action vectors. A representative …
arXiv stat.ML TIER_1 (CA) · Lijun Zhang · 2026-06-10 11:47

Efficient Multinomial Logistic Bandit via Frequent Directions

This paper studies efficient online algorithms for multinomial logistic bandits (MLogB), where the feedback distribution over $K+1$ outcomes follows a multinomial logistic model of $d$-dimensional action vectors. A representative UCB-type algorithm, OFUL-MLogB, achieves a regret …
arXiv stat.ML TIER_1 English(EN) · Udvas Das, Waris Radji, Debabrota Basu, Odalric-Ambrym Maillard · 2026-06-09 04:00

Bandits for Efficient Experimentation: Adapting to Control Group, Preferences, and Context Drifts

arXiv:2606.09802v1 Announce Type: cross Abstract: We consider a variant of the linear contextual stochastic multi-armed bandits, where the learner must provide recommendations to a group of users, each having its personalized preference vector, and in the presence of context dist…
arXiv stat.ML TIER_1 English(EN) · Deqi Zheng, Xiaoyang Xu, Yuhong Yang · 2026-06-09 04:00

Multi-Armed Bandits with Arriving Arms: Sequential Screening, Dynamic Regret, and Sublinear Guarantees

arXiv:2606.09002v1 Announce Type: new Abstract: We study a stochastic multi-armed bandit problem in which the set of available arms expands over time. This setting arises in sequential experimentation when new actions or treatments become available during an ongoing study, making…
arXiv stat.ML TIER_1 English(EN) · Odalric-Ambrym Maillard · 2026-06-08 17:53

Bandits for Efficient Experimentation: Adapting to Control Group, Preferences, and Context Drifts

We consider a variant of the linear contextual stochastic multi-armed bandits, where the learner must provide recommendations to a group of users, each having its personalized preference vector, and in the presence of context distributions that are drifting over time. Under pract…
arXiv stat.ML TIER_1 English(EN) · Yuhong Yang · 2026-06-08 03:58

Multi-Armed Bandits with Arriving Arms: Sequential Screening, Dynamic Regret, and Sublinear Guarantees

We study a stochastic multi-armed bandit problem in which the set of available arms expands over time. This setting arises in sequential experimentation when new actions or treatments become available during an ongoing study, making regret against a single best arm in hindsight i…

COVERAGE [14]

RELATED ENTITIES

RELATED TOPICS