PulseAugur
LIVE 16:53:01
research · [3 sources] ·
0
research

New DQPOPE algorithm estimates full return distribution for off-policy evaluation

Researchers have introduced a novel algorithm called DQPOPE for off-policy evaluation (OPE) that estimates the entire return distribution rather than just the expected value. This approach utilizes deep quantile process regression, offering theoretical advancements in estimating continuous quantile functions. The work includes a rigorous sample complexity analysis for distributional OPE with deep neural networks, demonstrating statistical advantages and improved policy value estimates compared to conventional methods. AI

Summary written by gemini-2.5-flash-lite from 3 sources. How we write summaries →

RANK_REASON This is a research paper published on arXiv detailing a new algorithm and theoretical analysis.

Read on arXiv stat.ML →

New DQPOPE algorithm estimates full return distribution for off-policy evaluation

COVERAGE [3]

  1. arXiv stat.ML TIER_1 · Qi Kuang, Chao Wang, Yuling Jiao, Fan Zhou ·

    Distributional Off-Policy Evaluation with Deep Quantile Process Regression

    arXiv:2604.18143v2 Announce Type: replace Abstract: This paper investigates the off-policy evaluation (OPE) problem from a distributional perspective. Rather than focusing solely on the expectation of the total return, as in most existing OPE methods, we aim to estimate the entir…

  2. arXiv stat.ML TIER_1 · Fan Zhou ·

    Distributional Off-Policy Evaluation with Deep Quantile Process Regression

    This paper investigates the off-policy evaluation (OPE) problem from a distributional perspective. Rather than focusing solely on the expectation of the total return, as in most existing OPE methods, we aim to estimate the entire return distribution. To this end, we introduce a q…

  3. arXiv stat.ML TIER_1 · Fan Zhou ·

    Distributional Off-Policy Evaluation with Deep Quantile Process Regression

    This paper investigates the off-policy evaluation (OPE) problem from a distributional perspective. Rather than focusing solely on the expectation of the total return, as in most existing OPE methods, we aim to estimate the entire return distribution. To this end, we introduce a q…