A new benchmark called AuthorityBench, comprising 220,564 prompts across general knowledge, science, law, and medicine, has been developed to study how citation presence influences large language models' behavior. The research found that the presence of citations, even fabricated ones, consistently increases hallucination rates compared to prompts without citations. This effect is most pronounced when false citations accompany true claims, significantly raising hallucination rates, particularly in the general knowledge domain. AI
IMPACT This research highlights a critical vulnerability in LLMs, suggesting that citation-augmented systems may require significant re-evaluation to mitigate increased hallucination rates.
RANK_REASON The cluster describes a new academic paper introducing a benchmark for evaluating LLM behavior.
- Aravind Ramana Ramanathan Narayanan
- arXiv
- AuthorityBench
- Large Language Models
- github.com/floating-reeds/AuthorityBench
- law
- medicine
- science
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →