New OPE method tackles missing rewards in reinforcement learning

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-18 13:19

Researchers have developed a new method for off-policy evaluation (OPE) in reinforcement learning scenarios where rewards are missing not at random (MNAR). This issue, common in fields like healthcare and marketing, can lead to biased evaluations. The proposed approach uses future states as shadow variables to identify the true mean reward and introduces a bridge function to estimate this without explicitly modeling the missingness mechanism. An estimator is then proposed that allows target policies to incorporate past missingness indicators, with theoretical guarantees on consistency and error bounds. AI

影响 This research could improve the reliability of reinforcement learning models in real-world applications where data is incomplete.

排序理由 Academic paper detailing a novel methodology for a specific machine learning problem. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv stat.ML 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

New OPE method tackles missing rewards in reinforcement learning

报道来源 [1]

arXiv stat.ML TIER_1 English(EN) · Rui Miao · 2026-06-18 13:19

用于具有随机缺失奖励的MDP中缺失感知策略的离策略评估

In offline Reinforcement Learning, immediate rewards in logged batch data are often unobserved due to sparse or irregular record-keeping, or censored beyond certain reward values. This issue arises in practical settings, including health care and marketing. We investigate off-pol…

报道来源 [1]

用于具有随机缺失奖励的MDP中缺失感知策略的离策略评估

相关实体

相关话题