Researchers have developed a new method for off-policy evaluation (OPE) in reinforcement learning scenarios where rewards are missing not at random (MNAR). This issue, common in fields like healthcare and marketing, can lead to biased evaluations. The proposed approach uses future states as shadow variables to identify the true mean reward and introduces a bridge function to estimate this without explicitly modeling the missingness mechanism. An estimator is then proposed that allows target policies to incorporate past missingness indicators, with theoretical guarantees on consistency and error bounds. AI
IMPACT This research could improve the reliability of reinforcement learning models in real-world applications where data is incomplete.
RANK_REASON Academic paper detailing a novel methodology for a specific machine learning problem. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →