English(EN) Beyond the Bellman Recursion: A Pontryagin-Guided Framework for Non-Exponential Discounting

新的 PG-DPO 框架增强了用于非指数贴现的强化学习能力

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-20 10:36

研究人员开发了一个名为庞特里亚金引导的直接策略优化 (PG-DPO) 的新框架，以解决强化学习方法的局限性。使用贝尔曼风格递归的传统方法在处理非指数贴现时遇到困难，而非指数贴现常见于模拟人类偏好和生存场景。PG-DPO 放弃了递归，而是将庞特里亚金最大值原理与蒙特卡洛滚动相结合，在专业基准测试上实现了更高的准确性和稳定性。 AI

影响引入了一种新颖的强化学习方法，可以改进复杂决策过程的建模。

排序理由该集群包含一篇详细介绍强化学习新框架的学术论文。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.LG TIER_1 English(EN) · Jeonggyu Huh · 2026-05-20 10:36

Beyond the Bellman Recursion: A Pontryagin-Guided Framework for Non-Exponential Discounting

Most value-based and actor--critic reinforcement learning methods rely on Bellman-style recursions, yet these recursions collapse under non-exponential discounting common in human preferences and survival processes. We show the breakdown is structural: exponential discounting sit…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-20 10:36

Beyond the Bellman Recursion: A Pontryagin-Guided Framework for Non-Exponential Discounting

Most value-based and actor--critic reinforcement learning methods rely on Bellman-style recursions, yet these recursions collapse under non-exponential discounting common in human preferences and survival processes. We show the breakdown is structural: exponential discounting sit…

报道来源 [2]

Beyond the Bellman Recursion: A Pontryagin-Guided Framework for Non-Exponential Discounting

Beyond the Bellman Recursion: A Pontryagin-Guided Framework for Non-Exponential Discounting

相关实体

相关话题