Best Arm Identification with Minimal Regret
Researchers have introduced a new problem called best arm identification (BAI) with minimal regret, which combines the objectives of identifying the best arm in a multi-armed bandit problem with minimizing cumulative regret. The study focuses on single-parameter exponential families and establishes a lower bound on expected cumulative regret using information-theoretic techniques. Additionally, an impossibility result highlights the trade-off between regret and sample complexity in fixed-confidence BAI, while the proposed Double KL-UCB algorithm demonstrates asymptotic optimality as confidence levels decrease. AI