A new research paper challenges the reliability of human-generated citation lists as a ground truth for evaluating literature search systems. The study introduces a 'Deep Research' pipeline that significantly improves recall compared to standard API-only searches. It also found that human citations are less relevant and more biased towards collaborators than AI-ranked results, suggesting a need for multi-faceted evaluation metrics. AI
RANK_REASON The cluster contains an academic paper detailing new research findings and methodologies.
Read on arXiv cs.IR (Information Retrieval) →
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →