English(EN) Benchmarking POS Tagging for the Tajik Language: A Comparative Study of Neural Architectures on the TajPersParallel Corpus

新的基准研究探讨了塔吉克语词性标注的神经网络性能

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-06 07:26

本文介绍了塔吉克语词性标注的第一个基准测试，评估了各种神经网络架构。该研究使用了TajPersParallel语料库，重点关注孤立词汇单元的独立于上下文的分类。结果表明，使用LoRA微调的mBERT模型表现最佳，但所有模型在没有句法上下文的情况下都难以处理形态歧义。 AI

影响为塔吉克语的自然语言处理任务奠定了基准，突出了低资源语言在形态歧义方面面临的挑战。

排序理由这是一篇研究论文，提出了一个新的基准测试和针对特定自然语言处理任务的神经网络架构的比较研究。

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CL TIER_1 English(EN) · Mullosharaf K. Arabov · 2026-05-07 04:00

塔吉克语词性标注的基准测试：TajPersParallel语料库上神经架构的比较研究

arXiv:2605.04576v1 Announce Type: new Abstract: This paper presents the first benchmark for the task of automatic part-of-speech (POS) tagging for the Tajik language. Despite the existence of multilingual language models demonstrating high effectiveness for many of the world's la…
arXiv cs.CL TIER_1 English(EN) · Mullosharaf K. Arabov · 2026-05-06 07:26

塔吉克语词性标注的基准测试：TajPersParallel语料库上神经网络架构的比较研究

This paper presents the first benchmark for the task of automatic part-of-speech (POS) tagging for the Tajik language. Despite the existence of multilingual language models demonstrating high effectiveness for many of the world's languages, their capacity for grammatical analysis…