New OPE method tackles missing rewards in reinforcement learning

By PulseAugur Editorial · [1 sources] · 2026-06-18 13:19

Researchers have developed a new method for off-policy evaluation (OPE) in reinforcement learning scenarios where rewards are missing not at random (MNAR). This issue, common in fields like healthcare and marketing, can lead to biased evaluations. The proposed approach uses future states as shadow variables to identify the true mean reward and introduces a bridge function to estimate this without explicitly modeling the missingness mechanism. An estimator is then proposed that allows target policies to incorporate past missingness indicators, with theoretical guarantees on consistency and error bounds. AI

IMPACT This research could improve the reliability of reinforcement learning models in real-world applications where data is incomplete.

RANK_REASON Academic paper detailing a novel methodology for a specific machine learning problem. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv stat.ML →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New OPE method tackles missing rewards in reinforcement learning

COVERAGE [1]

arXiv stat.ML TIER_1 English(EN) · Rui Miao · 2026-06-18 13:19

Off-Policy Evaluation for Missingness-Aware Policies in MDPs with Rewards Missing Not at Random

In offline Reinforcement Learning, immediate rewards in logged batch data are often unobserved due to sparse or irregular record-keeping, or censored beyond certain reward values. This issue arises in practical settings, including health care and marketing. We investigate off-pol…

COVERAGE [1]

Off-Policy Evaluation for Missingness-Aware Policies in MDPs with Rewards Missing Not at Random

RELATED ENTITIES

RELATED TOPICS