DeepVerifier 研究引入通过测试时验证实现自演化 AI 代理

作者 PulseAugur 编辑部 · [1 个来源] · 2026-04-30 04:00

研究人员开发了 DeepVerifier，一个新颖的系统，通过在推理时实现自改进来增强深度研究代理 (DRAs) 的能力。这是通过一个基于评分标准的验证过程实现的，代理会根据潜在故障的结构化分类来评估自己的输出。该系统展示了显著的改进，在元评估 F1 分数上超越基线方法高达 48%，并在具有挑战性的基准测试中实现了 8-11% 的准确率提升。为了进一步支持研究界，已发布一个包含 4,646 个专注于验证的代理步骤的数据集。 AI

影响引入了一种在推理时自改进 AI 代理的新方法，有可能在无需额外训练的情况下提高复杂任务的性能。

排序理由这是一篇详细介绍改进 AI 代理新方法的学术论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Yuxuan Wan, Tianqing Fang, Zaitang Li, Yintong Huo, Wenxuan Wang, Haitao Mi, Dong Yu, Michael R. Lyu · 2026-04-30 04:00

Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification

arXiv:2601.15808v2 Announce Type: replace Abstract: Recent advances in Deep Research Agents (DRAs) are transforming automated knowledge discovery and problem-solving. While the majority of existing efforts focus on enhancing policy capabilities via post-training, we propose an al…

报道来源 [1]

Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification

相关话题