Brief · PulseAugur

RESEARCH · arXiv cs.LG English(EN) · 2d · [11 sources]

Algorithm for Contextual Queueing Bandits with Rate-Optimal Queue Length Regret

Researchers have published several new papers detailing advancements in multi-armed bandit algorithms. One study introduces replicable UCB-based exploration methods for stochastic and linear bandits, improving regret guarantees. Another paper unifies Gaussian-process UCB and decision-estimation-coefficient methods for kernel bandits, highlighting the distinction between algorithmic information and minimax complexity. Additionally, new algorithms address sliding-window streaming bandits with limited memory and contextual queueing bandits, achieving improved regret rates and characterizing minimax dependencies. AI

IMPACT Advances in bandit algorithms can lead to more efficient online learning systems for recommendation engines, resource allocation, and experimentation platforms.

UCB for Arriving Arms (UCB-AA)
UCB-AA
CQB-η-2
arXiv
RepRidge
RepLinUCB
RepGLM
RepGLMUCB
decision-estimation-coefficient (DEC)
Gaussian-process upper confidence bound (GP-UCB)
RepUCB