Researchers have developed a new method for off-policy evaluation (OPE) in reinforcement learning scenarios where rewards are missing not at random (MNAR). This issue, common in fields like healthcare and marketing, can lead to biased evaluations. The proposed approach uses future states as shadow variables to identify the true mean reward and introduces a bridge function to estimate this without explicitly modeling the missingness mechanism. An estimator is then proposed that allows target policies to incorporate past missingness indicators, with theoretical guarantees on consistency and error bounds. AI
影响 This research could improve the reliability of reinforcement learning models in real-world applications where data is incomplete.
排序理由 Academic paper detailing a novel methodology for a specific machine learning problem. [lever_c_demoted from research: ic=1 ai=1.0]
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →