PulseAugur
实时 16:09:38

新研究推进机器学习的策略外评估技术

两篇新研究论文探讨了机器学习中策略外评估(OPE)的高级技术,这是一个使用现有数据评估新策略性能的关键过程。第一篇论文引入了“Quotient DAGs”,用于处理奖励仅取决于无序项目集但生成过程是有序的情况,从而减少了干扰方差。第二篇论文“CANDOR”提出了一种双重稳健的OPE估计器,通过将注释纳入奖励模型组件,有效地利用了不完美的专家标注反事实样本,尤其适用于医疗保健应用。 AI

影响 这些论文引入了新颖的策略外评估方法,有可能提高在医疗保健等关键领域部署新策略的可靠性和安全性。

排序理由 两篇发表在arXiv上的学术论文,详细介绍了策略外评估的新颖方法。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. arXiv cs.AI TIER_1 English(EN) · Ziwen Xie, Shaowen Xiang, Hongyu He, Dianbo Liu ·

    Quotient DAGs for Off-Policy Evaluation:Forward-Flow Importance Sampling and Exact Slate Propensities

    arXiv:2605.29500v1 Announce Type: cross Abstract: Off-policy evaluation estimates how a target policy would perform using data collected by a different behavior policy, which is crucial when online testing is costly or risky, such as in recommendation or healthcare. Standard impo…

  2. arXiv stat.ML TIER_1 English(EN) · Aishwarya Mandyam, Shengpu Tang, Jiayu Yao, Jenna Wiens, Barbara E. Engelhardt ·

    CANDOR: Counterfactual ANnotated DOubly Robust Off-Policy Evaluation

    arXiv:2412.08052v2 Announce Type: replace-cross Abstract: Off-policy evaluation (OPE) is critical for applying contextual bandit algorithms to high-stakes decision-making settings such as healthcare, where new treatment policies must be evaluated prior to deployment. Unfortunatel…