PulseAugur
EN
LIVE 10:40:26

New benchmark audits LLM scholar recommendations and user interventions

Researchers have developed LLMScholarBench, a new benchmark designed to audit Large Language Models (LLMs) used for academic expert recommendation. This benchmark evaluates both the LLM's inherent capabilities and the impact of user interventions during the recommendation process. Experiments across 22 LLMs in physics expert recommendation revealed that interventions like temperature adjustments, diversity-focused prompting, and retrieval-augmented generation (RAG) each present unique trade-offs, affecting metrics such as factuality, diversity, and representation. AI

IMPACT Provides a framework for evaluating and improving the fairness and accuracy of LLM-driven academic discovery tools.

RANK_REASON The cluster contains an academic paper detailing a new benchmark for evaluating LLM performance. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.AI TIER_1 English(EN) · Lisette Esp\'in-Noboa, Gonzalo Gabriel M\'endez ·

    Whose Name Comes Up? II: Benchmarking and Intervention-Based Auditing of LLM-Based Scholar Recommendation

    arXiv:2602.08873v2 Announce Type: replace-cross Abstract: Large language models (LLMs) are now used for academic expert recommendation. Existing audits typically evaluate such recommendations in isolation, ignoring end-user inference-time interventions. Thus, it remains unclear w…