PulseAugur
实时 05:00:14

TajikNLP toolkit offers comprehensive open-source processing for Tajik language

Researchers have developed TajikNLP, an open-source Python library designed to process the Tajik language, which is written in Cyrillic script and has been underserved by existing NLP tools. The toolkit offers a comprehensive pipeline including cleaning, tokenization, morphological analysis, and sentiment analysis, with a novel morphology engine to handle complex inflections. Accompanying the library are four newly published linguistic datasets to support future research and applications. AI

影响 Establishes foundational NLP infrastructure for the Tajik language, enabling new academic and industrial applications.

排序理由 This is a research paper introducing an open-source toolkit for a low-resource language.

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

TajikNLP toolkit offers comprehensive open-source processing for Tajik language

报道来源 [2]

  1. arXiv cs.CL TIER_1 English(EN) · Mullosharaf K. Arabov ·

    TajikNLP: An Open-Source Toolkit for Comprehensive Text Processing of Tajik (Cyrillic Script)

    arXiv:2605.04583v1 Announce Type: new Abstract: The Tajik language, written in Cyrillic script, remains severely under-resourced in terms of publicly available natural language processing (NLP) toolkits, hindering both linguistic research and applied development. This paper intro…

  2. arXiv cs.CL TIER_1 English(EN) · Mullosharaf K. Arabov ·

    TajikNLP: An Open-Source Toolkit for Comprehensive Text Processing of Tajik (Cyrillic Script)

    The Tajik language, written in Cyrillic script, remains severely under-resourced in terms of publicly available natural language processing (NLP) toolkits, hindering both linguistic research and applied development. This paper introduces TajikNLP, an open-source Python library th…