Researchers have introduced ScholarQuest, a new benchmark designed to evaluate the performance of AI agents in academic paper search. This benchmark is built upon over 1,000 computer science topics and four distinct research intents, aiming to provide a more realistic and systematic assessment than existing methods. Initial benchmarking reveals that while agentic approaches outperform traditional single-shot retrieval, there is significant room for improvement in their effectiveness, with current top agents achieving limited recall rates. AI
影响 This benchmark could accelerate the development of more effective AI-powered academic search tools.
排序理由 The cluster describes a new benchmark for evaluating AI agents, presented in an academic paper. [lever_c_demoted from research: ic=1 ai=1.0]
在 arXiv cs.IR (Information Retrieval) 阅读 →
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →