PulseAugur
EN
LIVE 15:12:36

New research questions human citations as AI search benchmark

A new research paper challenges the reliability of human-generated citation lists as a ground truth for evaluating literature search systems. The study introduces a 'Deep Research' pipeline that significantly improves recall compared to standard API-only searches. It also found that human citations are less relevant and more biased towards collaborators than AI-ranked results, suggesting a need for multi-faceted evaluation metrics. AI

RANK_REASON The cluster contains an academic paper detailing new research findings and methodologies.

Read on arXiv cs.IR (Information Retrieval) →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New research questions human citations as AI search benchmark

COVERAGE [2]

  1. arXiv cs.AI TIER_1 English(EN) · Gaurav Sahu, Laurent Charlin, Christopher Pal ·

    Rethinking Literature Search Evaluation: Deep Research Helps, and Human Citation Lists Are Not a Ground Truth

    arXiv:2605.29234v1 Announce Type: new Abstract: We study large-scale literature search from two complementary angles: improving the retrieval pipeline, and stress-testing the human reference list as an evaluation target. First, we implement a Deep Research pipeline that processes…

  2. arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Christopher Pal ·

    Rethinking Literature Search Evaluation: Deep Research Helps, and Human Citation Lists Are Not a Ground Truth

    We study large-scale literature search from two complementary angles: improving the retrieval pipeline, and stress-testing the human reference list as an evaluation target. First, we implement a Deep Research pipeline that processes the full query paper and expands the retrieved …