English(EN) Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization

新算法在实验中平衡用户奖励与统计准确性

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-20 04:00

研究人员开发了一种名为 TS-PostDiff 的新算法，旨在改善在线实验中用户收益与统计准确性之间的平衡。均匀随机分配等传统方法在统计上是可靠的，但适应速度慢；而 Thompson Sampling 等多臂老虎机算法可以快速优化用户参与度，但可能引入统计偏差。TS-PostDiff 智能地融合了这些方法，在差异较大时使用 Thompson Sampling，在差异较小时恢复到均匀随机分配，从而减少误报并提高统计功效。 AI

影响为自适应实验提供了一种更具统计可靠性的方法，有望提高在线 A/B 测试和强化学习应用的效率和可靠性。

排序理由发布了一篇详细介绍新算法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv stat.ML 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv stat.ML TIER_1 English(EN) · Tong Li, Jacob Nogas, Haochen Song, Anna Rafferty, Eric M. Schwartz, Audrey Durand, Harsh Kumar, Nina Deliu, Sofia S. Villar, Dehan Kong, Joseph J. Williams · 2026-05-20 04:00

用于自适应实验的算法，在统计分析与奖励之间进行权衡：结合均匀随机分配和奖励最大化

arXiv:2112.08507v5 Announce Type: replace-cross Abstract: Traditional randomized A/B experiments assign arms with uniform random (UR) probability, such as 50/50 assignment to two versions of a website to discover whether one version engages users more. To more quickly and automat…

报道来源 [1]

用于自适应实验的算法，在统计分析与奖励之间进行权衡：结合均匀随机分配和奖励最大化

相关实体

相关话题