Researchers have introduced L3Cube-MahaPOS, a new dataset for Marathi Part-of-Speech (POS) tagging, addressing the scarcity of annotated resources for the language. The dataset contains over 32,000 manually annotated sentences from news text, aligned with Universal Dependencies. It was used to benchmark six model families, with the best system achieving 88.67% token-level accuracy and a macro-F1 score of 81.67%. The dataset, annotation guidelines, and trained models are being released to promote further research in Marathi Natural Language Processing. AI
IMPACT Facilitates research and development in Marathi NLP, potentially improving downstream applications like machine translation and information extraction for a large speaker base.
RANK_REASON The cluster describes the release of a new academic dataset and associated models for a specific language's NLP task.
- BERT Models
- BiLSTM
- BiLSTM+CharCNN
- CRF
- English
- Hindi
- L3Cube-MahaPOS
- MahaBERT-v2
- Marathi
- Universal Dependencies
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →