新的 FQE 和 FQI 方法绕过 Bellman 完全性以实现稳定性

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-11 04:00

研究人员开发了新的拟合 Q 评估 (FQE) 和软拟合 Q 迭代 (soft FQI) 方法，这些方法不需要 Bellman 完全性，而 Bellman 完全性在使用函数逼近时常常无法满足。所提出的技术，即静态加权 FQE 和静态重加权 soft FQI，通过重新加权回归步骤以匹配目标策略的静态分布来解决不稳定性问题。这些方法旨在提高强化学习的离策略评估的稳定性和减少值误差。 AI

影响增强了强化学习离策略评估的理论基础，可能改进复杂环境中的模型训练和决策制定。

排序理由两篇 arXiv 论文介绍了强化学习评估的新颖理论方法。

在 arXiv stat.ML 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv stat.ML TIER_1 English(EN) · Lars van der Laan, Nathan Kallus · 2026-05-11 04:00

通过平稳加权在没有贝尔曼完备性的情况下进行拟合 $Q$ 评估

arXiv:2512.23805v3 Announce Type: replace Abstract: Fitted $Q$-evaluation (FQE) is a standard regression-based tool for off-policy evaluation, but existing stability guarantees often rely on Bellman completeness, a strong closure condition that can fail under function approximati…
arXiv stat.ML TIER_1 English(EN) · Lars van der Laan, Nathan Kallus · 2026-05-11 04:00

Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration

arXiv:2512.23927v2 Announce Type: replace Abstract: Fitted $Q$-iteration (FQI) and soft FQI are widely used value-based methods for offline reinforcement learning, but their standard stability guarantees often depend on Bellman completeness, a strong closure condition that can fa…

报道来源 [2]

通过平稳加权在没有贝尔曼完备性的情况下进行拟合 $Q$ 评估

Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration

相关实体

相关话题