From Statute to Control Flow: Span-Grounded Deontic Trees for Defeasible Scope Parsing
Researchers have introduced NormBench, a new benchmark designed to evaluate how well AI models can understand and parse legal and policy documents, specifically focusing on identifying nested exceptions and counter-exceptions. The benchmark uses Span-Grounded Deontic Trees (SG-DT) to represent rules and their exceptions, allowing for more precise scope parsing. Evaluations of current large language models revealed issues like "Recursion Decay" and an "Auditability Trap," indicating difficulties in handling complex rule structures and exceptions, though SG-DT showed promise in improving performance on these specific challenges. AI
IMPACT Highlights limitations in current LLMs for precise legal and policy interpretation, suggesting a need for improved reasoning and auditability in rule-following agents.