Charge as a Construct-Validity Factor in Chinese Legal Case Retrieval: A Cross-Benchmark Audit
A new audit of Chinese Legal Case Retrieval (LCR) benchmarks reveals that the primary charge of a case, which encodes its legal characterization, is a significant factor in determining relevance. Researchers found that ranking cases solely by shared primary charge, combined with BM25, recovers nearly all of the performance gap between basic retrieval methods and advanced trained systems on the LeCaRDv2 benchmark. This suggests that current benchmarks may be overstating the legal reasoning capabilities of AI systems, as relevance is often determined by construction rather than true understanding of legal principles. AI
IMPACT Highlights potential overestimation of AI's legal reasoning abilities in current benchmarks, suggesting a need for more robust evaluation methods.