PulseAugur
EN
LIVE 05:23:55

New Marathi POS Tagging Dataset and BERT Models Released

Researchers have introduced L3Cube-MahaPOS, a new dataset for Marathi Part-of-Speech (POS) tagging, addressing the scarcity of annotated resources for the language. The dataset contains over 32,000 manually annotated sentences from news text, aligned with Universal Dependencies. It was used to benchmark six model families, with the best system achieving 88.67% token-level accuracy and a macro-F1 score of 81.67%. The dataset, annotation guidelines, and trained models are being released to promote further research in Marathi Natural Language Processing. AI

IMPACT Facilitates research and development in Marathi NLP, potentially improving downstream applications like machine translation and information extraction for a large speaker base.

RANK_REASON The cluster describes the release of a new academic dataset and associated models for a specific language's NLP task.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New Marathi POS Tagging Dataset and BERT Models Released

COVERAGE [2]

  1. arXiv cs.CL TIER_1 English(EN) · Hariom Ingle, Ronit Ghode, Ishwari Gondkar, Jidnyasa Harad, Raviraj Joshi ·

    L3Cube-MahaPOS: A Marathi Part-of-Speech Tagging Dataset and BERT Models

    arXiv:2606.24825v1 Announce Type: new Abstract: Part-of-Speech (POS) tagging is a foundational NLP task underpinning machine translation, information extraction, and syntactic parsing. Despite Marathi being spoken by over 83 million people and ranking among the top twenty most sp…

  2. arXiv cs.CL TIER_1 English(EN) · Raviraj Joshi ·

    L3Cube-MahaPOS: A Marathi Part-of-Speech Tagging Dataset and BERT Models

    Part-of-Speech (POS) tagging is a foundational NLP task underpinning machine translation, information extraction, and syntactic parsing. Despite Marathi being spoken by over 83 million people and ranking among the top twenty most spoken languages worldwide, it remains severely un…