A new paper introduces PERRY, a method for constructing valid confidence intervals for off-policy evaluation (OPE) when using auxiliary data, such as that generated by models. This approach is crucial for high-stakes domains like healthcare, where reliable uncertainty estimates are needed for safe deployment of reinforcement learning (RL) policies. PERRY offers two methods: one for state-conditioned policy values and another for average policy performance, drawing on conformal prediction and doubly robust estimation techniques. Experiments across various simulators and a real healthcare dataset demonstrate PERRY's ability to leverage auxiliary data effectively and provide accurate confidence intervals. AI
IMPACT Enables more reliable deployment of reinforcement learning policies in critical applications by providing robust uncertainty quantification.
RANK_REASON The cluster contains an academic paper detailing a new method for reinforcement learning policy evaluation. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →