新的 bandit 算法优化推荐系统以实现长期用户满意度

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-24 04:00

研究人员开发了一种新颖的方法来优化推荐系统以实现长期用户满意度，解决了延迟奖励的挑战。他们的方法结合了短期代理结果和延迟奖励，并使用贝叶斯滤波器来创建预测模型。该模型随后为 bandit 算法提供信息，该算法旨在快速识别能够带来长期用户参与的内容。在一项大规模播客推荐系统上的 A/B 测试表明，这种方法在仅依赖短期代理或延迟奖励的方法方面表现出色。 AI

影响这项研究可能带来更有效的推荐系统，更好地实现长期用户参与目标。

排序理由关于推荐系统新算法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv stat.ML 阅读 →

Kelly Zhang

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv stat.ML TIER_1 English(EN) · Kelly W. Zhang, Thomas Baldwin-McDonald, Kamil Ciosek, Lucas Maystre, Daniel Russo · 2026-06-24 04:00

Impatient Bandits: Optimizing for the Long-Term Without Delay

arXiv:2501.07761v2 Announce Type: replace-cross Abstract: Increasingly, recommender systems are tasked with improving users' long-term satisfaction. In this context, we study a content exploration task, which we formalize as a bandit problem with delayed rewards. There is an appa…

报道来源 [1]

Impatient Bandits: Optimizing for the Long-Term Without Delay

相关话题