LLMs show promise in grading German legal exams

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have developed a system called GradeLegal to automate the grading of German legal exam solutions using large language models. The study evaluated 27 different LLMs and various prompting strategies, finding that reasoning-oriented models can achieve high agreement with expert graders in public law, reaching a quadratic weighted kappa of 0.91. However, performance in criminal law was lower, indicating a more challenging task. Ensembling multiple models further improved grading accuracy, offering a potential alternative to top-tier proprietary models. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Automated grading systems could streamline feedback for legal students and reduce bottlenecks for educators.

RANK_REASON The cluster contains an academic paper presenting a new methodology and evaluation of LLMs for a specific task. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

COVERAGE [1]

arXiv cs.CL TIER_1 · Jelena Mitrovic · 2026-05-20 12:09

GradeLegal: Automated Grading for German Legal Cases

Grading German legal exam solutions faces growing volumes and a shortage of qualified graders, delaying feedback and creating a bottleneck. At the same time, it is a high-stakes expert task, since state exam grades strongly influence career outcomes in Germany. Despite this pract…

COVERAGE [1]

GradeLegal: Automated Grading for German Legal Cases

RELATED ENTITIES

RELATED TOPICS