EURO-5K: When Does Domain Pretraining Matter? Benchmarking Transformers for EU Reporting Obligation Extraction
Researchers have developed EURO-5K, a new dataset for extracting reporting obligations from EU legislation, crucial for compliance automation. They compared transformer-based models, including BERT and LLMs, using full fine-tuning and parameter-efficient QLoRA methods. Results indicated that fully fine-tuned generic and legal BERT models performed comparably to fine-tuned LLMs for sentence-level extraction, with legal pretraining offering marginal benefits for generative models but significant advantages for parameter-efficient tuning. AI
IMPACT Provides a specialized dataset and evaluated models for automating regulatory compliance, potentially reducing burden for businesses operating within the EU.