新的MuDABench基准测试跨海量文档集合的分析式问答

作者 PulseAugur 编辑部 · [2 个来源] · 2026-04-24 05:28

研究人员推出了MuDABench，这是一个专为跨大型文档集合进行分析式问答设计的新基准。该基准要求系统综合来自众多来源的信息以执行定量分析，这是当前检索增强生成（RAG）系统难以胜任的任务。提出的多代理工作流程显示出有所改进，但仍未达到人类专家的表现，凸显了信息提取和领域特定知识方面的挑战。 AI

影响突出了当前RAG系统在复杂分析式问答方面的局限性，指出了未来研究和发展的方向。

排序理由这是一篇介绍特定AI任务新基准的研究论文。

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CL TIER_1 English(EN) · Zhanli Li, Yixuan Cao, Lvzhou Luo, Ping Luo · 2026-04-27 04:00

导航大规模文档集：用于多文档分析性问答的MuDABench

arXiv:2604.22239v1 Announce Type: new Abstract: This paper introduces the task of analytical question answering over large, semi-structured document collections. We present MuDABench, a benchmark for multi-document analytical QA, where questions require extracting and synthesizin…
arXiv cs.CL TIER_1 English(EN) · Ping Luo · 2026-04-24 05:28

驾驭大规模文档集：用于多文档分析问答的MuDABench

This paper introduces the task of analytical question answering over large, semi-structured document collections. We present MuDABench, a benchmark for multi-document analytical QA, where questions require extracting and synthesizing information across numerous documents to perfo…