Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 1w

LP-Eval: Rubric and Dataset for Measuring the Quality of Legal Proposition Generation

Researchers have developed LP-Eval, a new rubric and dataset designed to measure the quality of legal propositions generated by large language models. Co-created with legal experts, the rubric assesses propositions based on formal validity and substantive dimensions, using decisions from the Court of Justice of the European Union. The findings indicate that LLMs can produce well-formed legal propositions, with quality varying based on the recency of the source cases. Additionally, the study found that LLMs can act as evaluators, showing better alignment with expert assessments when guided by the rubric compared to direct scoring. AI

IMPACT Provides a structured method for evaluating the quality of AI-generated legal text, potentially improving LLM performance in legal applications.

large language models
Court of Justice of the European Union
LP-Eval