PulseAugur
LIVE 06:25:20
tool · [1 source] ·
0
tool

New benchmark reveals legal LLMs struggle with citation accuracy

Researchers have developed LegalCiteBench, a new benchmark designed to evaluate the reliability of legal language models in generating accurate case citations. The benchmark, comprising approximately 24,000 instances derived from 1,000 U.S. judicial opinions, focuses on tasks such as citation retrieval, completion, error detection, and case verification. Testing revealed that even advanced models struggle with exact citation recovery, scoring below 70% on critical tasks, with many exhibiting high rates of fabricating incorrect or irrelevant authorities. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT New benchmark highlights critical citation reliability issues in legal LLMs, potentially impacting adoption in legal drafting and research.

RANK_REASON The cluster contains an academic paper introducing a new benchmark for evaluating LLMs. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

COVERAGE [1]

  1. arXiv cs.AI TIER_1 · Shunfan Zhou ·

    LegalCiteBench: Evaluating Citation Reliability in Legal Language Models

    Large language models (LLMs) are increasingly integrated into legal drafting and research workflows, where incorrect citations or fabricated precedents can cause serious professional harm. Existing legal benchmarks largely emphasize statutory reasoning, contract understanding, or…