Researchers have introduced a new problem called best arm identification (BAI) with minimal regret, which combines the objectives of identifying the best arm in a multi-armed bandit problem with minimizing cumulative regret. The study focuses on single-parameter exponential families and establishes a lower bound on expected cumulative regret using information-theoretic techniques. Additionally, an impossibility result highlights the trade-off between regret and sample complexity in fixed-confidence BAI, while the proposed Double KL-UCB algorithm demonstrates asymptotic optimality as confidence levels decrease. AI
排序理由 The cluster contains an academic paper detailing a new problem formulation and algorithm in machine learning. [lever_c_demoted from research: ic=1 ai=1.0]
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →