New bandit algorithms tackle adversarial attacks and complex applications

By PulseAugur Editorial · [33 sources] · 2026-05-26 04:00

Researchers are exploring new frontiers in bandit algorithms, focusing on their application and robustness in complex scenarios. One paper investigates adversarial attacks on high-dimensional offline bandits, revealing vulnerabilities in reward models used for evaluating generative AI. Other research delves into theoretical advancements, such as variance-sensitive Thompson sampling, finite-time regret analysis for retry-aware bandits, and improved algorithms for adversarial linear contextual bandits. Additionally, studies are examining bandit applications in latent-state environments, dueling bandits with delayed feedback, and even deep brain stimulation, highlighting the algorithm's versatility. AI

IMPACT Advances in bandit algorithms enhance evaluation of generative models and open new avenues for AI applications in healthcare and recommendation systems.

RANK_REASON Multiple arXiv papers detailing theoretical advancements and applications of bandit algorithms.

Read on arXiv cs.LG →

paper
other

AI-generated summary · Google Gemini · from 33 sources. How we write summaries →

New bandit algorithms tackle adversarial attacks and complex applications

COVERAGE [33]

arXiv cs.LG TIER_1 Deutsch(DE) · Sourav Chakraborty, Amit Kiran Rege, Claire Monteleoni, Lijun Chen · 2026-06-05 04:00

Multi-Agent Lipschitz Bandits

arXiv:2602.16965v2 Announce Type: replace Abstract: We study the decentralized multi-player stochastic bandit problem over a continuous, Lipschitz-structured action space where hard collisions yield zero reward. Our objective is to design a communication-free policy that maximize…
arXiv cs.LG TIER_1 English(EN) · Chaitanya Kharyal, Calarina Muslimani, Matthew E. Taylor · 2026-06-05 04:00

Reward Learning through Ranking Mean Squared Error

arXiv:2601.09236v3 Announce Type: replace Abstract: Reward design remains a significant bottleneck in applying reinforcement learning (RL) to real-world problems. A popular alternative is reward learning, where reward functions are inferred from human feedback rather than manuall…
arXiv cs.LG TIER_1 English(EN) · Mingyang Liu, Asuman Ozdaglar, Tiancheng Yu, Kaiqing Zhang · 2026-06-05 04:00

Regret Minimization with Adaptive Opponents in Repeated Games

arXiv:2606.06486v1 Announce Type: new Abstract: In this paper, we study regret minimization in repeated games with \emph{adaptive} opponents who can respond based on histories of play. The standard metric of \emph{external regret} in online learning is known to fail to capture su…
arXiv cs.AI TIER_1 English(EN) · Kaiqing Zhang · 2026-06-04 17:59

Regret Minimization with Adaptive Opponents in Repeated Games

In this paper, we study regret minimization in repeated games with \emph{adaptive} opponents who can respond based on histories of play. The standard metric of \emph{external regret} in online learning is known to fail to capture such adaptivity. To account for players' counterfa…
arXiv cs.AI TIER_1 English(EN) · Seyed Mohammad Hadi Hosseini, Amir Najafi, Mahdieh Soleymani Baghshah · 2026-06-04 04:00

Efficient Adversarial Attacks on High-dimensional Offline Bandits

arXiv:2602.01658v2 Announce Type: replace-cross Abstract: Bandit algorithms have recently emerged as a powerful tool for evaluating machine learning models, including generative image models and large language models, by efficiently identifying top-performing candidates without e…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-04 00:00

Regret Minimization with Adaptive Opponents in Repeated Games

Repeated policy regret provides a game-theoretic framework for analyzing adaptive opponents in repeated games, offering stronger equilibrium guarantees than traditional external regret through novel non-convex optimization algorithms.
arXiv cs.LG TIER_1 English(EN) · Katherine Avery, Chinmay Pendse, David Jensen · 2026-06-02 04:00

Evaluating and Learning Robust Bandit Policies Under Uncertain Causal Mechanisms

arXiv:2508.02812v3 Announce Type: replace Abstract: Causal graphical models can encode large amounts structural knowledge, both from the background knowledge of domain experts and the structural knowledge discovered from randomized experiments or observational data. However, thou…
arXiv cs.LG TIER_1 English(EN) · Tom Perneczky, Marc Abeille, David Janz · 2026-06-02 04:00

Variance-sensitive Thompson sampling for generalised linear bandits, revisited

arXiv:2606.00431v1 Announce Type: new Abstract: We prove a variance-sensitive regret bound for Thompson sampling in stochastic generalised linear bandits. The argument assumes a warm-up, after which the regret is controlled through using the Gaussian Poincar\'e inequality. This b…
arXiv cs.LG TIER_1 English(EN) · Tim van Erven, Jack Mayo, Julia Olkhovskaya, Chen-Yu Wei · 2026-06-02 04:00

An Improved Algorithm for Adversarial Linear Contextual Bandits via Reduction

arXiv:2508.11931v3 Announce Type: replace Abstract: We present an oracle-efficient, near-optimal algorithm for linear contextual bandits with adversarial losses and stochastic action sets, only requiring a linear optimization oracle for the action sets in each round. Our approach…
arXiv cs.LG TIER_1 English(EN) · Jikai Jin, Kenneth Hung, Sanath Kumar Krishnamurthy, Baoyi Shi, Congshan Zhang · 2026-06-02 04:00

Adaptive Exploration for Latent-State Bandits

arXiv:2602.05139v3 Announce Type: replace Abstract: We study bandits whose rewards depend on an unobserved Markov state that evolves independently of the learner's actions. The optimal arm can change even though the learner observes only past actions and rewards. We propose algor…
arXiv cs.LG TIER_1 English(EN) · Bingkui Tong, Junpei Komiyama, Soichiro Nishimori, Paavo Parmas · 2026-06-02 04:00

Finite-Time Regret Analysis of Retry-Aware Bandits

arXiv:2605.20854v2 Announce Type: replace Abstract: We study a stochastic bandit algorithm motivated by retry-aware objectives that value the best outcome among multiple attempts, such as pass@$k$ and max@$k$. Given a posterior over arm values, ReMax chooses a sampling distributi…
arXiv cs.AI TIER_1 English(EN) · William Overman, Mohsen Bayati · 2026-06-01 04:00

Annealed Softmax Greedy in Many-Armed Bayesian Bandits

arXiv:2605.31034v1 Announce Type: cross Abstract: Reinforcement learning with verifiable rewards (RLVR) and group-based policy optimization methods such as GRPO update a stochastic policy by sampling multiple completions per prompt and increasing the policy's probability on those…
arXiv cs.LG TIER_1 English(EN) · Shogo Iwazaki · 2026-05-29 04:00

Nearly-Optimal Algorithm for Adversarial Kernelized Bandits

arXiv:2605.10299v2 Announce Type: replace Abstract: This paper studies kernelized bandits (also known as Gaussian process bandits) in an adversarial environment, where the reward functions in a known reproducing kernel Hilbert space (RKHS) may be adversarially chosen at each roun…
arXiv cs.LG TIER_1 English(EN) · Arkaprava Gupta, Nicholas Carter, William Zellers, Prateek Ganguli, Benedikt Dietrich, Vibhor Krishna, Parasara Sridhar Duggirala, Samarjit Chakraborty · 2026-05-29 04:00

Bandit Algorithms for Deep Brain Stimulation

arXiv:2601.12699v2 Announce Type: replace Abstract: Deep Brain Stimulation (DBS) is an effective treatment for Parkinson's disease, but conventional fixed-parameter stimulation can reduce battery life and cause side effects while failing to adapt to changing neural dynamics. Rece…
arXiv cs.AI TIER_1 English(EN) · Xiangyi Wang, Pingchen Lu, Jie Mao, Mingze Kong, Zhi Hong, Zhiyong Wang, Zhongxiang Dai · 2026-05-27 04:00

Linear and Neural Dueling Bandits with Delayed Feedback

arXiv:2605.26554v1 Announce Type: cross Abstract: Contextual dueling bandits form a cornerstone of preference-based decision-making, with critical applications in recommender systems and large language model alignment. However, standard algorithms rely on the idealized assumption…
arXiv cs.LG TIER_1 English(EN) · Yu-Jie Zhang, Hao Qiu, Jonathan Scarlett, Kevin Jamieson · 2026-05-27 04:00

Near-Optimal Regret in Adversarial Kernel Bandits

arXiv:2605.26585v1 Announce Type: new Abstract: We study the adversarial kernel bandit problem, in which the loss at each round is induced by an arbitrary bounded element of a reproducing kernel Hilbert space (RKHS). We propose an exponential-weights algorithm built on a regulari…
arXiv cs.LG TIER_1 English(EN) · Emma Brunskill, Ishani Karmarkar, Zhaoqi Li · 2026-05-26 04:00

Active Learning for Stochastic Contextual Linear Bandits

arXiv:2605.24803v1 Announce Type: new Abstract: A key goal in stochastic contextual linear bandits is to efficiently learn a near-optimal policy. Prior algorithms for this problem learn a policy by strategically sampling actions but naively (passively) sampling contexts from the …
arXiv cs.LG TIER_1 English(EN) · Daniel Ezer, Alon Peled-Cohen, Yishay Mansour · 2026-05-26 04:00

Stochastic Linear Bandits with Parameter Noise

arXiv:2601.23164v2 Announce Type: replace Abstract: We study the stochastic linear bandits with parameter noise model, in which the reward of action $a$ is $a^\top \theta$ where $\theta$ is sampled i.i.d. We show a regret upper bound of $\widetilde{O} (\sqrt{d T \log (K/\delta) \…
arXiv stat.ML TIER_1 English(EN) · Foo Hui-Mean, Yuan-chin I Chang · 2026-06-04 04:00

ALMAB-DC: Active Learning, Multi-Armed Bandits, and Distributed Computing for Sequential Experimental Design and Black-Box Optimization

arXiv:2603.21180v4 Announce Type: replace-cross Abstract: Sequential experimental design under expensive, gradient-free objectives is a central challenge in computational statistics: evaluation budgets are tightly constrained and information must be extracted efficiently from eac…
arXiv stat.ML TIER_1 English(EN) · Kushagra Chandak, Toshinori Kitamura, Xiaoqi Tan · 2026-06-04 04:00

Offline-to-Online Learning in Linear Bandits

arXiv:2606.04305v1 Announce Type: cross Abstract: We study online learning with an additional offline dataset in the stochastic linear bandit setting. Although this problem arises frequently in practice, the offline-to-online tradeoff remains poorly understood in structured envir…
arXiv stat.ML TIER_1 English(EN) · Marc Abeille, David Janz, Ciara Pike-Burke · 2026-06-04 04:00

When and why randomised exploration works (in linear bandits)

arXiv:2502.08870v2 Announce Type: replace-cross Abstract: We provide an approach for the analysis of randomised exploration algorithms like Thompson sampling that does not rely on forced optimism or posterior inflation. With this, we demonstrate that in the $d$-dimensional linear…
arXiv stat.ML TIER_1 English(EN) · Youngmin Oh, Jinje Park, Taejin Paik · 2026-06-04 04:00

Neural Variance-aware Dueling Bandits with Deep Representation and Shallow Exploration

arXiv:2506.01250v3 Announce Type: replace-cross Abstract: We introduce the first variance-aware algorithms for contextual dueling bandits that leverage shallow exploration strategies with neural networks for nonlinear utility approximation. A key theoretical challenge is the abse…
arXiv stat.ML TIER_1 English(EN) · Xiaoqi Tan · 2026-06-03 00:18

Offline-to-Online Learning in Linear Bandits

We study online learning with an additional offline dataset in the stochastic linear bandit setting. Although this problem arises frequently in practice, the offline-to-online tradeoff remains poorly understood in structured environments. We propose a linear bandit algorithm that…
arXiv stat.ML TIER_1 English(EN) · Samya Praharaj, Chih-Yu Chang, Koulik Khamaru, Kelly W. Zhang · 2026-06-02 04:00

Bandit Simulation for Average Reward Inference

arXiv:2606.00913v1 Announce Type: new Abstract: Multi-arm bandit algorithms are increasingly used in online platforms, clinical trials, and social science experiments, but valid statistical inference on their performance remains an open challenge. After deploying bandits, a natur…
arXiv stat.ML TIER_1 English(EN) · Zhen Li (LMO, CELESTE, HEC Paris), Gilles Stoltz (LMO, CELESTE, HEC Paris) · 2026-06-02 04:00

A Direct Approach for Handling Contextual Bandits with Latent State Dynamics

arXiv:2604.08149v2 Announce Type: replace-cross Abstract: We consider a linear contextual bandit model where contexts and rewards are governed by a finite hidden Markov chain. We first revisit the simplified model by Nelson et al. (2022), in which rewards are linear functions of …
arXiv stat.ML TIER_1 English(EN) · Sanghoon Yu, Min-hwan Oh · 2026-06-02 04:00

Practical and Optimal Algorithm for Linear Contextual Bandits with Rare Parameter Updates

arXiv:2606.00984v1 Announce Type: new Abstract: We study linear contextual bandits under rare parameter updates: the learner may incorporate reward feedback into its parameter estimate only at a small number of update times, while still observing contexts online and selecting act…
arXiv stat.ML TIER_1 English(EN) · Andrew Jacobsen, Dorian Baudry, Shinji Ito, Nicol\`o Cesa-Bianchi · 2026-06-01 04:00

A Perturbation Approach to Unconstrained Linear Bandits

arXiv:2603.28201v2 Announce Type: replace-cross Abstract: We revisit the standard perturbation-based approach of Abernethy et al. (2008) in the context of unconstrained Bandit Linear Optimization (uBLO). We show the surprising result that in the unconstrained setting, this approa…
arXiv stat.ML TIER_1 English(EN) · Ivan Lau, Daniel McMorrow, Kevin Jamieson, Jonathan Scarlett · 2026-06-01 04:00

Batched Stochastic Linear Bandits with 1-Bit Communication Constraints

arXiv:2605.30976v1 Announce Type: new Abstract: We study stochastic linear bandits under a natural combination of batching and communication constraints: the time horizon is partitioned into batches of equal size $B$, and during each batch the learner sends $B$ requested arm pull…
arXiv stat.ML TIER_1 English(EN) · Min-hwan Oh · 2026-05-31 03:46

Practical and Optimal Algorithm for Linear Contextual Bandits with Rare Parameter Updates

We study linear contextual bandits under rare parameter updates: the learner may incorporate reward feedback into its parameter estimate only at a small number of update times, while still observing contexts online and selecting actions sequentially. This viewpoint clarifies a pr…
arXiv stat.ML TIER_1 English(EN) · Kelly W. Zhang · 2026-05-30 22:27

Bandit Simulation for Average Reward Inference

Multi-arm bandit algorithms are increasingly used in online platforms, clinical trials, and social science experiments, but valid statistical inference on their performance remains an open challenge. After deploying bandits, a natural question is whether one can construct a confi…
arXiv stat.ML TIER_1 English(EN) · Jonathan Scarlett · 2026-05-29 08:17

Batched Stochastic Linear Bandits with 1-Bit Communication Constraints

We study stochastic linear bandits under a natural combination of batching and communication constraints: the time horizon is partitioned into batches of equal size $B$, and during each batch the learner sends $B$ requested arm pulls to an agent, who then observes the correspondi…
arXiv stat.ML TIER_1 English(EN) · Seoungbin Bae, Dabeen Lee · 2026-05-29 04:00

Neural Logistic Bandits

arXiv:2505.02069v2 Announce Type: replace-cross Abstract: We study the problem of neural logistic bandits, where the main task is to learn an unknown reward function within a logistic link function using a neural network. Existing approaches either exhibit unfavorable dependencie…
arXiv stat.ML TIER_1 English(EN) · Chaiwon Kim, Jongyeong Lee, Min-hwan Oh · 2026-05-29 04:00

Follow-the-Perturbed-Leader for Decoupled Bandits: Best-of-Both-Worlds and Practicality

arXiv:2510.12152v2 Announce Type: replace Abstract: We study the decoupled multi-armed bandit problem, where the learner separately selects one arm for exploration and one, possibly different, arm for exploitation at each round. In this setting, the loss of the explored arm is ob…

COVERAGE [33]

RELATED ENTITIES

RELATED TOPICS