ModernBERT models adapted for legal domain show significant performance gains

By PulseAugur Editorial · [1 sources] · 2026-06-30 04:00

Researchers have investigated domain adaptation for modern BERT models within the legal sector. By further pre-training ModernBERT on a large corpus of US court opinions using masked language modeling, they achieved significant performance improvements on legal datasets compared to the vanilla model. This approach, which involved further pre-training an existing checkpoint rather than training from scratch, resulted in models capable of processing up to 8,192 tokens and generating meaningful embeddings for legal passages. The developed models are now publicly available. AI

IMPACT Enhances specialized AI capabilities for legal professionals, potentially improving legal research and document analysis.

RANK_REASON The cluster contains an academic paper detailing research on adapting existing language models for a specific domain. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

ModernBERT models adapted for legal domain show significant performance gains

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Dominik Stammbach, Peter Henderson · 2026-06-30 04:00

Legal Domain Adaptation of Modern BERT Models

arXiv:2606.28538v1 Announce Type: new Abstract: We investigate domain adaptation of modern BERT models in the legal domain. We further pre-train ModernBERT on all US court opinions using the masked language modeling objective. Although ModernBERT has been trained on roughly 500x …

COVERAGE [1]

Legal Domain Adaptation of Modern BERT Models

RELATED ENTITIES

RELATED TOPICS