PulseAugur
实时 11:31:18
English(EN) DRACULA: Hunting for the Actions Users Want Deep Research Agents to Execute

DRACULA数据集引入了用户对研究代理中间操作的反馈

研究人员推出了DRACULA,一个旨在通过捕获用户对中间操作的反馈来改进深度研究代理的新数据集。该数据集收集自19位计算机科学领域专家研究员,包含与综合研究论文相关的8,000多个操作偏好和5,000个执行判断。初步研究结果表明,大型语言模型在预测用户偏好的操作时存在困难,尤其是在用户有未说明的目标时,这凸显了对长时程代理需要更好的操作反馈机制。 AI

影响 为训练和评估AI代理执行复杂、多步骤任务提供了新数据集。

排序理由 该集群描述了在arXiv上发布的新数据集和论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

DRACULA数据集引入了用户对研究代理中间操作的反馈

报道来源 [2]

  1. arXiv cs.CL TIER_1 English(EN) · Nishant Balepur, Malachi Hamada, Varsha Kishore, Sergey Feldman, Amanpreet Singh, Pao Siangliulue, Joseph Chee Chang, Rachel Rudinger, Eunsol Choi, Jordan Lee Boyd-Graber, Doug Downey, Aakanksha Naik ·

    DRACULA: Hunting for the Actions Users Want Deep Research Agents to Execute

    arXiv:2604.23815v1 Announce Type: new Abstract: Scientific Deep Research (DR) agents answer user queries by synthesizing research papers into multi-section reports. User feedback can improve their utility, but existing protocols only score the final report, making it hard to stud…

  2. arXiv cs.CL TIER_1 English(EN) · Aakanksha Naik ·

    DRACULA: Hunting for the Actions Users Want Deep Research Agents to Execute

    Scientific Deep Research (DR) agents answer user queries by synthesizing research papers into multi-section reports. User feedback can improve their utility, but existing protocols only score the final report, making it hard to study and learn which intermediate actions DR agents…