Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 1w

Whose Name Comes Up? II: Benchmarking and Intervention-Based Auditing of LLM-Based Scholar Recommendation

Researchers have developed LLMScholarBench, a new benchmark designed to audit Large Language Models (LLMs) used for academic expert recommendation. This benchmark evaluates both the LLM's inherent capabilities and the impact of user interventions during the recommendation process. Experiments across 22 LLMs in physics expert recommendation revealed that interventions like temperature adjustments, diversity-focused prompting, and retrieval-augmented generation (RAG) each present unique trade-offs, affecting metrics such as factuality, diversity, and representation. AI

IMPACT Provides a framework for evaluating and improving the fairness and accuracy of LLM-driven academic discovery tools.

Large Language Models
Lisette Elizabeth Espín Noboa
LLMScholarBench