新框架统一了 Bandit 问题的目标

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-01 07:17

研究人员引入了一个名为 Tree-Guided Identify-Then-Exploit (TG-ITE) 的新框架，以解决随机对战 Bandit 中的多个目标。这种统一的方法旨在同时优化最佳臂识别 (BAI) 并最小化弱 regret 和强 regret。TG-ITE 首先识别一个高置信度的现有臂，然后采用针对特定目标的定制开发策略，从而提高样本复杂度和联合优化能力。 AI

影响为优化 Bandit 问题中的决策引入了新颖的理论框架，可能影响推荐系统和在线学习。

排序理由该集群包含一篇详细介绍特定机器学习问题新框架的学术论文。

在 arXiv stat.ML 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv stat.ML TIER_1 English(EN) · Pu Wang, Yao-Xiang Ding · 2026-06-02 04:00

树引导的识别-然后利用：用于对决赌博机的最佳臂识别和遗憾最小化的统一框架

arXiv:2606.01799v1 Announce Type: cross Abstract: We study $N$-armed stochastic dueling bandits under the Condorcet-winner assumption, where three widely adopted objectives are considered: best-arm identification (BAI), weak regret, and strong regret. We propose Tree-Guided Ident…
arXiv stat.ML TIER_1 English(EN) · Yao-Xiang Ding · 2026-06-01 07:17

Tree-Guided Identify-Then-Exploit: A Unified Framework of Best Arm Identification and Regret Minimization for Dueling Bandits

We study $N$-armed stochastic dueling bandits under the Condorcet-winner assumption, where three widely adopted objectives are considered: best-arm identification (BAI), weak regret, and strong regret. We propose Tree-Guided Identify-Then-Exploit (TG-ITE), the first unified frame…

报道来源 [2]

树引导的识别-然后利用：用于对决赌博机的最佳臂识别和遗憾最小化的统一框架

Tree-Guided Identify-Then-Exploit: A Unified Framework of Best Arm Identification and Regret Minimization for Dueling Bandits

相关话题