Q-MMR framework offers novel approach to off-policy evaluation

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 4 sources

Researchers have introduced Q-MMR, a new theoretical framework for off-policy evaluation in Markov Decision Processes (MDPs). This method learns weights for data points to approximate expected returns under a target policy, utilizing a moment-matching objective. A key finding is a data-dependent, dimension-free finite-sample guarantee for general function approximation, which is notable for not depending on the complexity of the function class. AI

Summary written by gemini-2.5-flash-lite from 4 sources. How we write summaries →

IMPACT Introduces a novel theoretical framework for off-policy evaluation, potentially improving reinforcement learning agent training.

RANK_REASON The cluster contains an academic paper detailing a new theoretical framework for a machine learning problem.

Read on Hugging Face Daily Papers →

paper
other

COVERAGE [4]

arXiv cs.LG TIER_1 · Xiang Li, Nan Jiang · 2026-05-08 04:00

Q-MMR: Off-Policy Evaluation via Recursive Reweighting and Moment Matching

arXiv:2605.06474v1 Announce Type: new Abstract: We present a novel theoretical framework, Q-MMR, for off-policy evaluation in finite-horizon MDPs. Q-MMR learns a set of scalar weights, one for each data point, such that the reweighted rewards approximate the expected return under…
Hugging Face Daily Papers TIER_1 · 2026-05-07 16:00

Q-MMR: Off-Policy Evaluation via Recursive Reweighting and Moment Matching

We present a novel theoretical framework, Q-MMR, for off-policy evaluation in finite-horizon MDPs. Q-MMR learns a set of scalar weights, one for each data point, such that the reweighted rewards approximate the expected return under the target policy. The weights are learned indu…
arXiv stat.ML TIER_1 · Nan Jiang · 2026-05-07 16:00

Q-MMR: Off-Policy Evaluation via Recursive Reweighting and Moment Matching

We present a novel theoretical framework, Q-MMR, for off-policy evaluation in finite-horizon MDPs. Q-MMR learns a set of scalar weights, one for each data point, such that the reweighted rewards approximate the expected return under the target policy. The weights are learned indu…
arXiv stat.ML TIER_1 · Nan Jiang · 2026-05-07 16:00

Q-MMR: Off-Policy Evaluation via Recursive Reweighting and Moment Matching

We present a novel theoretical framework, Q-MMR, for off-policy evaluation in finite-horizon MDPs. Q-MMR learns a set of scalar weights, one for each data point, such that the reweighted rewards approximate the expected return under the target policy. The weights are learned indu…

COVERAGE [4]

Q-MMR: Off-Policy Evaluation via Recursive Reweighting and Moment Matching

Q-MMR: Off-Policy Evaluation via Recursive Reweighting and Moment Matching

Q-MMR: Off-Policy Evaluation via Recursive Reweighting and Moment Matching

Q-MMR: Off-Policy Evaluation via Recursive Reweighting and Moment Matching

RELATED ENTITIES

RELATED TOPICS