PulseAugur
LIVE 21:31:58
research · [3 sources] ·
1
research

New research tackles evolving action spaces and exploration in contextual bandits

Two new research papers explore advanced techniques for contextual bandits, a machine learning approach used in recommendation and decision-making systems. One paper introduces a method called PONA that can select new actions even when the action space evolves after initial data collection, by leveraging action features. The other paper, RIE-Greedy, proposes a novel exploration strategy that utilizes the inherent randomness in model fitting and regularization processes, showing theoretical equivalence to Thompson Sampling in certain cases and practical effectiveness in business environments. AI

Summary written by gemini-2.5-flash-lite from 3 sources. How we write summaries →

IMPACT These papers advance contextual bandit algorithms, potentially improving recommendation systems and decision-making in dynamic environments.

RANK_REASON Two academic papers published on arXiv detailing new methods for contextual bandits.

Read on arXiv cs.LG →

COVERAGE [3]

  1. arXiv cs.LG TIER_1 · Negar Kiyavash ·

    Active Context Selection Improves Simple Regret in Contextual Bandits

    We study the contextual multi-armed bandit problem with a finite context space (a.k.a. subpopulations), where the learner recommends a best action for each context and is evaluated by context-weighted simple regret. Our guarantees are worst-case over the reward distributions, whi…

  2. arXiv cs.LG TIER_1 · Yuta Saito ·

    Offline Contextual Bandits in the Presence of New Actions

    Automated decision-making algorithms drive applications such as recommendation systems and search engines. These algorithms often rely on off-policy contextual bandits or off-policy learning (OPL). Conventionally, OPL selects actions that maximize the expected reward from an exis…

  3. arXiv stat.ML TIER_1 · Tong Li, Thiago de Queiroz Casanova, Eric M. Schwartz, Victor Kostyuk, Dehan Kong, Joseph J. Williams ·

    RIE-Greedy: Regularization-Induced Exploration for Contextual Bandits

    arXiv:2603.11276v2 Announce Type: replace Abstract: Real-world contextual bandit problems with complex reward models are often tackled with iteratively trained models, such as boosting trees. However, it is difficult to directly apply simple and effective exploration strategies--…