Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 2w · [2 sources]

Quotient DAGs for Off-Policy Evaluation:Forward-Flow Importance Sampling and Exact Slate Propensities

Two new research papers explore advanced techniques for off-policy evaluation (OPE) in machine learning, a critical process for assessing the performance of new policies using existing data. The first paper introduces "Quotient DAGs" to handle situations where the reward depends only on an unordered set of items, even if the generation process is ordered, thereby reducing nuisance variance. The second paper, "CANDOR," proposes a doubly robust OPE estimator that effectively leverages imperfect expert-annotated counterfactual samples, particularly for healthcare applications, by incorporating annotations into the reward model component. AI

IMPACT These papers introduce novel methods for off-policy evaluation, potentially improving the reliability and safety of deploying new policies in critical domains like healthcare.

healthcare
electronic health records
importance sampling
reward model
off-policy evaluation
Quotient DAGs
Exact Slate Propensities
Forward-Flow Importance Sampling
Counterfactual ANnotated DOubly Robust Off-Policy Evaluation