Researchers have investigated domain adaptation for modern BERT models within the legal sector. By further pre-training ModernBERT on a large corpus of US court opinions using masked language modeling, they achieved significant performance improvements on legal datasets compared to the vanilla model. This approach, which involved further pre-training an existing checkpoint rather than training from scratch, resulted in models capable of processing up to 8,192 tokens and generating meaningful embeddings for legal passages. The developed models are now publicly available. AI
IMPACT Enhances specialized AI capabilities for legal professionals, potentially improving legal research and document analysis.
RANK_REASON The cluster contains an academic paper detailing research on adapting existing language models for a specific domain. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →