A new paper argues that current benchmarks for legal AI models are insufficient for evaluating their potential to improve access to justice. The research highlights that existing benchmarks test models on pre-processed legal inputs, measuring an upper bound of performance. However, for pro se litigants, inputs are often noisy and contain errors, representing a lower bound that current benchmarks fail to capture. The authors propose developing new legal benchmarks that directly assess model robustness with pro se-like inputs to ensure empirical testing of access-to-justice claims. AI
IMPACT Current legal AI benchmarks may overestimate model capabilities, potentially hindering genuine improvements in access to justice for pro se litigants.
RANK_REASON The cluster contains a research paper discussing limitations of current AI benchmarks in the legal domain. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →