PulseAugur / Brief
EN
LIVE 07:11:42

Brief

last 24h
[1/1] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. Off-Policy Evaluation for Missingness-Aware Policies in MDPs with Rewards Missing Not at Random

    Researchers have developed a new method for off-policy evaluation (OPE) in reinforcement learning when rewards are missing not at random (MNAR). This approach addresses selection bias by using future states as shadow variables to identify the full-data conditional mean reward. The proposed estimator, inspired by Fitted-Q-Evaluation, allows target policies to incorporate past missingness indicators and has demonstrated strong performance in experiments on simulated data and MIMIC-III Sepsis data. AI

    Off-Policy Evaluation for Missingness-Aware Policies in MDPs with Rewards Missing Not at Random

    IMPACT Improves the reliability of reinforcement learning models in real-world scenarios with incomplete data.