None Prudent-Banker: No Extra Fees for Baseline Safety in Adversarial Bandits With and Without Delays

Prudent-Banker 算法确保延迟反馈中的安全性

作者 PulseAugur 编辑部 · [2 sources] · 2026-05-22 08:14

研究人员推出了一种名为 Prudent-Banker 的新型算法，用于对抗性多臂赌博机，即使在反馈延迟的情况下也能保持安全保证。这种新颖的方法将延迟适应型在线镜像下降法与分阶段激进机制相结合，以确保与安全基线策略相比，遗憾值接近恒定。该算法的关键创新在于延迟校准的重启阈值，该阈值严格考虑了反馈失真并可靠地检测次优性。Prudent-Banker 实现了最佳的安全-鲁棒性权衡，理论保证和实验验证表明其在各种延迟分布中平衡安全性和学习方面的有效性。 AI

影响在复杂的赌博机环境中引入了一种新颖的安全决策算法，有可能提高 AI 代理在反馈不确定的现实场景中的可靠性。

排序理由该集群包含一篇详细介绍特定机器学习问题新算法的研究论文。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.LG TIER_1 · Ting Hu, Luanda Cai, Emmanouil-Vasileios Vlatakis-Gkaragkounis · 2026-05-25 04:00

Prudent-Banker: No Extra Fees for Baseline Safety in Adversarial Bandits With and Without Delays

arXiv:2605.23351v1 Announce Type: new Abstract: We study adversarial multi-armed bandits with and without delayed feedback under a safety-aware goal: achieving minimax-optimal worst-case regret while keeping nearly constant regret relative to a designated "safe" baseline policy. …
arXiv cs.LG TIER_1 · Emmanouil-Vasileios Vlatakis-Gkaragkounis · 2026-05-22 08:14

Prudent-Banker: No Extra Fees for Baseline Safety in Adversarial Bandits With and Without Delays

We study adversarial multi-armed bandits with and without delayed feedback under a safety-aware goal: achieving minimax-optimal worst-case regret while keeping nearly constant regret relative to a designated "safe" baseline policy. Existing approaches can balance this trade-off w…

报道来源 [2]

Prudent-Banker: No Extra Fees for Baseline Safety in Adversarial Bandits With and Without Delays

Prudent-Banker: No Extra Fees for Baseline Safety in Adversarial Bandits With and Without Delays

相关话题