English(EN) Rethinking Literature Search Evaluation: Deep Research Helps, and Human Citation Lists Are Not a Ground Truth

新研究质疑人类引文作为AI搜索基准的可靠性

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-28 01:50

一篇新的研究论文挑战了人类生成的引文列表作为评估文献检索系统可靠性的事实依据。该研究引入了一个“深度研究”管道，与仅使用API的标准搜索相比，显著提高了召回率。研究还发现，与AI排名结果相比，人类引文的相关性较低，并且更偏向于合作者，这表明需要多方面的评估指标。 AI

排序理由该集群包含一篇详细介绍新研究发现和方法的学术论文。

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Gaurav Sahu, Laurent Charlin, Christopher Pal · 2026-05-29 04:00

Rethinking Literature Search Evaluation: Deep Research Helps, and Human Citation Lists Are Not a Ground Truth

arXiv:2605.29234v1 Announce Type: new Abstract: We study large-scale literature search from two complementary angles: improving the retrieval pipeline, and stress-testing the human reference list as an evaluation target. First, we implement a Deep Research pipeline that processes…
arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Christopher Pal · 2026-05-28 01:50

Rethinking Literature Search Evaluation: Deep Research Helps, and Human Citation Lists Are Not a Ground Truth

We study large-scale literature search from two complementary angles: improving the retrieval pipeline, and stress-testing the human reference list as an evaluation target. First, we implement a Deep Research pipeline that processes the full query paper and expands the retrieved …