This paper introduces the first benchmark for part-of-speech tagging in the Tajik language, evaluating various neural network architectures. The study utilized the TajPersParallel corpus, focusing on context-independent classification of isolated lexical units. Results indicated that the mBERT model, fine-tuned with LoRA, performed best, though all models struggled with morphological ambiguity without syntactic context. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Establishes a baseline for NLP tasks in Tajik, highlighting challenges in morphological ambiguity for low-resource languages.
RANK_REASON This is a research paper presenting a new benchmark and comparative study of neural architectures for a specific NLP task.