New method improves continuous-time policy evaluation with high-order regression

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have developed a new method for continuous-time policy evaluation using high-order generator regression. This approach improves upon the traditional Bellman baseline by considering multi-step transitions and estimating the time-dependent generator more accurately. The proposed method offers an interpretable framework with a clear operating region, demonstrating consistent performance gains in various calibration studies. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

RANK_REASON Academic paper on a novel statistical method for policy evaluation.

Read on arXiv stat.ML →

paper
other

COVERAGE [1]

arXiv stat.ML TIER_1 · Yichi Zhang · 2026-04-21 01:53

Beyond Bellman: High-Order Generator Regression for Continuous-Time Policy Evaluation

We study finite-horizon continuous-time policy evaluation from discrete closed-loop trajectories under time-inhomogeneous dynamics. The target value surface solves a backward parabolic equation, but the Bellman baseline obtained from one-step recursion is only first-order in the …

COVERAGE [1]

Beyond Bellman: High-Order Generator Regression for Continuous-Time Policy Evaluation

RELATED TOPICS