English(EN) Hybrid Sequence Modeling and Reinforced Verification for Controllable Target-Conditioned Decision Making

新的Doctor框架通过强化验证增强了可控决策

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-24 04:00

研究人员开发了Doctor，一个结合了序列建模和强化验证的新型框架，用于可控的离线决策。该方法解决了目标回报信号的不可靠性问题，尤其是在数据代表性不足的区域。Doctor采用了一个在重建和价值学习目标上进行训练的掩码轨迹Transformer。在推理时，它会生成多个候选动作，并选择具有最接近请求目标的已验证价值的动作，从而提高了可控性并在标准基准测试中保持竞争力。 AI

影响这项研究可能在需要基于目标回报进行精确决策的领域带来更可靠、更可控的AI系统。

排序理由这是一篇详细介绍新决策框架的研究论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.LG TIER_1 English(EN) · Yue Pei, Hongming Zhang, Chao Gao, Martin M\"uller, Yingying Zhang, Mengxiao Zhu, Hao Sheng, Ziliang Chen, Liang Lin, Haogang Zhu · 2026-06-24 04:00

Hybrid Sequence Modeling and Reinforced Verification for Controllable Target-Conditioned Decision Making

arXiv:2508.16420v3 Announce Type: replace Abstract: Target-conditioned sequence models provide a simple interface for controllable offline decision making, but the requested target return can be an unreliable control signal, especially when the target return lies in underrepresen…

报道来源 [1]

Hybrid Sequence Modeling and Reinforced Verification for Controllable Target-Conditioned Decision Making

相关实体

相关话题