PulseAugur
EN
LIVE 13:29:18

moBERTo: New Portuguese Language Model Enhances NLP Tasks

Researchers have introduced moBERTo, a new Portuguese language model derived from ModernBERT through continued pretraining. This model was trained on 60 billion tokens, incorporating data from FineWeb2 and filtered STEM and educational content. moBERTo has demonstrated strong performance in various natural language processing tasks, including information retrieval, document classification, named entity recognition, and natural language understanding, particularly excelling in Portuguese retrieval benchmarks. AI

IMPACT Enhances NLP capabilities for Portuguese, potentially improving information retrieval and document understanding in the language.

RANK_REASON The item describes a new language model released as a research paper on arXiv, detailing its training process and evaluation. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

moBERTo: New Portuguese Language Model Enhances NLP Tasks

COVERAGE [1]

  1. arXiv cs.CL TIER_1 English(EN) · Giovana Kerche Bonás ·

    moBERTo: A Modern Encoder for Portuguese via Continued Pretraining of ModernBERT

    Encoder-only transformer models remain essential for production NLP pipelines. We introduce moBERTo, a Portuguese adaptation of ModernBERT obtained through continued pretraining of the ModernBERT-base checkpoint on 60 billion tokens (5 epochs over a 12-billion-token corpus curate…