Brief · PulseAugur

Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization

Researchers have developed a new algorithm called TS-PostDiff that aims to improve the balance between user benefit and statistical accuracy in online experiments. Traditional methods like uniform random assignment are statistically sound but slow to adapt, while multi-armed bandit algorithms like Thompson Sampling can quickly optimize for user engagement but may introduce statistical biases. TS-PostDiff intelligently blends these approaches, using Thompson Sampling when differences are large and reverting to uniform random assignment when differences are small, thereby reducing false positives and increasing statistical power. AI

IMPACT Offers a more statistically sound approach to adaptive experimentation, potentially improving the efficiency and reliability of online A/B testing and reinforcement learning applications.

RESEARCH · arXiv cs.LG English(EN) · 1w · [9 sources]

Offline Contextual Bandits in the Presence of New Actions

Researchers are exploring advanced techniques for contextual bandit problems, focusing on improving regret bounds and handling dynamic environments. One paper introduces a retry-aware bandit algorithm that aims to optimize for the best outcome among multiple attempts, proving the first sublinear regret bound for this objective. Another study proposes active context selection to enhance simple regret in contextual bandits, showing significant improvements over passive sampling. Additionally, a new method called PONA is presented for offline contextual bandits that can effectively learn and select new actions by leveraging action features, outperforming existing methods that are limited to pre-defined action sets. Finally, a novel approach called RIE-Greedy uses regularization-induced exploration in contextual bandits, demonstrating theoretical equivalence to Thompson Sampling and practical effectiveness. AI

IMPACT These papers introduce novel algorithms and theoretical analyses for contextual bandit problems, potentially improving decision-making in recommendation systems and other applications.