PulseAugur
实时 12:53:02
English(EN) Off-Policy Evaluation for Missingness-Aware Policies in MDPs with Rewards Missing Not at Random

新的策略外评估方法解决了强化学习中缺失奖励的问题

研究人员开发了一种新的策略外评估(OPE)方法,用于处理强化学习中奖励缺失非随机(MNAR)的情况。该方法利用未来状态作为影子变量来识别完整数据的条件平均奖励,从而解决选择偏差问题。提出的估计器受Fitted-Q-Evaluation的启发,允许目标策略纳入过去的缺失指标,并在模拟数据和MIMIC-III败血症数据集的实验中表现出强大的性能。 AI

影响 提高了强化学习模型在数据不完整现实场景中的可靠性。

排序理由 关于机器学习新方法的学术论文。

在 arXiv stat.ML 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

新的策略外评估方法解决了强化学习中缺失奖励的问题

报道来源 [2]

  1. arXiv stat.ML TIER_1 English(EN) · Ziheng Wei, Annie Qu, Rui Miao ·

    Off-Policy Evaluation for Missingness-Aware Policies in MDPs with Rewards Missing Not at Random

    arXiv:2606.20206v1 Announce Type: new Abstract: In offline Reinforcement Learning, immediate rewards in logged batch data are often unobserved due to sparse or irregular record-keeping, or censored beyond certain reward values. This issue arises in practical settings, including h…

  2. arXiv stat.ML TIER_1 English(EN) · Rui Miao ·

    用于具有随机缺失奖励的MDP中缺失感知策略的离策略评估

    In offline Reinforcement Learning, immediate rewards in logged batch data are often unobserved due to sparse or irregular record-keeping, or censored beyond certain reward values. This issue arises in practical settings, including health care and marketing. We investigate off-pol…