PulseAugur
EN
LIVE 22:21:33

New bandit algorithms tackle replicability, kernel complexity, and streaming data

Researchers have published several new papers detailing advancements in multi-armed bandit algorithms. One study introduces replicable UCB-based exploration methods for stochastic and linear bandits, improving regret guarantees. Another paper unifies Gaussian-process UCB and decision-estimation-coefficient methods for kernel bandits, highlighting the distinction between algorithmic information and minimax complexity. Additionally, new algorithms address sliding-window streaming bandits with limited memory and contextual queueing bandits, achieving improved regret rates and characterizing minimax dependencies. AI

IMPACT Advances in bandit algorithms can lead to more efficient online learning systems for recommendation engines, resource allocation, and experimentation platforms.

RANK_REASON Multiple academic papers published on arXiv detailing new algorithms and theoretical analyses in the field of multi-armed bandits.

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 11 sources. How we write summaries →

COVERAGE [11]

  1. arXiv cs.LG TIER_1 English(EN) · Rohan Deb, Udaya Ghai, Karan Singh, Arindam Banerjee ·

    Replicable Bandits with UCB based Exploration

    arXiv:2604.20024v2 Announce Type: replace Abstract: We study replicable algorithms for stochastic multi-armed bandits (MAB) and linear bandits with UCB (Upper Confidence Bound) based exploration. A bandit algorithm is $\rho$-replicable if two executions using shared internal rand…

  2. arXiv cs.LG TIER_1 English(EN) · Yunbei Xu ·

    Algorithmic and Minimax Complexities in Kernel Bandits

    arXiv:2606.11171v1 Announce Type: new Abstract: Gaussian-process upper confidence bound (GP-UCB) and decision-estimation-coefficient (DEC) methods may appear, at first sight, to belong to different theories. This paper places the two viewpoints in a common algorithmic-information…

  3. arXiv cs.LG TIER_1 English(EN) · Yunbei Xu ·

    Algorithmic and Minimax Complexities in Kernel Bandits

    Gaussian-process upper confidence bound (GP-UCB) and decision-estimation-coefficient (DEC) methods may appear, at first sight, to belong to different theories. This paper places the two viewpoints in a common algorithmic-information language for frequentist RKHS bandits. GP-UCB f…

  4. arXiv cs.LG TIER_1 English(EN) · Vladimir Braverman, Chen Wang, Liudeng Wang, Samson Zhou ·

    Online Learning with Recency: Algorithms for Sliding-window Streaming Multi-armed Bandits

    arXiv:2606.08977v1 Announce Type: new Abstract: Motivated by the recency effect in online learning, we study algorithms for single-pass *sliding-window streaming multi-armed bandits (MABs)* in this paper. In this setting, we are given $n$ arms with unknown sub-Gaussian reward dis…

  5. arXiv cs.LG TIER_1 English(EN) · Seoungbin Bae, Dabeen Lee ·

    Algorithm for Contextual Queueing Bandits with Rate-Optimal Queue Length Regret

    arXiv:2606.09668v1 Announce Type: new Abstract: Contextual queueing bandits provide a framework for learning to schedule heterogeneous jobs under unknown context-dependent service rates. Under stochastic contexts, existing algorithms achieve $\widetilde{\mathcal{O}}(T^{-1/4})$ qu…

  6. Hugging Face Daily Papers TIER_1 English(EN) ·

    Bandits for Efficient Experimentation: Adapting to Control Group, Preferences, and Context Drifts

    We consider a variant of the linear contextual stochastic multi-armed bandits, where the learner must provide recommendations to a group of users, each having its personalized preference vector, and in the presence of context distributions that are drifting over time. Under pract…

  7. arXiv cs.LG TIER_1 English(EN) · Dabeen Lee ·

    Algorithm for Contextual Queueing Bandits with Rate-Optimal Queue Length Regret

    Contextual queueing bandits provide a framework for learning to schedule heterogeneous jobs under unknown context-dependent service rates. Under stochastic contexts, existing algorithms achieve $\widetilde{\mathcal{O}}(T^{-1/4})$ queue length regret, defined as the expected diffe…

  8. arXiv stat.ML TIER_1 English(EN) · Deqi Zheng, Xiaoyang Xu, Yuhong Yang ·

    Multi-Armed Bandits with Arriving Arms: Sequential Screening, Dynamic Regret, and Sublinear Guarantees

    arXiv:2606.09002v1 Announce Type: new Abstract: We study a stochastic multi-armed bandit problem in which the set of available arms expands over time. This setting arises in sequential experimentation when new actions or treatments become available during an ongoing study, making…

  9. arXiv stat.ML TIER_1 English(EN) · Udvas Das, Waris Radji, Debabrota Basu, Odalric-Ambrym Maillard ·

    Bandits for Efficient Experimentation: Adapting to Control Group, Preferences, and Context Drifts

    arXiv:2606.09802v1 Announce Type: cross Abstract: We consider a variant of the linear contextual stochastic multi-armed bandits, where the learner must provide recommendations to a group of users, each having its personalized preference vector, and in the presence of context dist…

  10. arXiv stat.ML TIER_1 English(EN) · Odalric-Ambrym Maillard ·

    Bandits for Efficient Experimentation: Adapting to Control Group, Preferences, and Context Drifts

    We consider a variant of the linear contextual stochastic multi-armed bandits, where the learner must provide recommendations to a group of users, each having its personalized preference vector, and in the presence of context distributions that are drifting over time. Under pract…

  11. arXiv stat.ML TIER_1 English(EN) · Yuhong Yang ·

    Multi-Armed Bandits with Arriving Arms: Sequential Screening, Dynamic Regret, and Sublinear Guarantees

    We study a stochastic multi-armed bandit problem in which the set of available arms expands over time. This setting arises in sequential experimentation when new actions or treatments become available during an ongoing study, making regret against a single best arm in hindsight i…