PulseAugur
LIVE 12:24:36
research · [1 source] ·
0
research

New method improves continuous-time policy evaluation with high-order regression

Researchers have developed a new method for continuous-time policy evaluation using high-order generator regression. This approach improves upon the traditional Bellman baseline by considering multi-step transitions and estimating the time-dependent generator more accurately. The proposed method offers an interpretable framework with a clear operating region, demonstrating consistent performance gains in various calibration studies. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

RANK_REASON Academic paper on a novel statistical method for policy evaluation.

Read on arXiv stat.ML →

New method improves continuous-time policy evaluation with high-order regression

COVERAGE [1]

  1. arXiv stat.ML TIER_1 · Yichi Zhang ·

    Beyond Bellman: High-Order Generator Regression for Continuous-Time Policy Evaluation

    We study finite-horizon continuous-time policy evaluation from discrete closed-loop trajectories under time-inhomogeneous dynamics. The target value surface solves a backward parabolic equation, but the Bellman baseline obtained from one-step recursion is only first-order in the …