English(EN) A Systematic Benchmark of Machine Transliteration Models for the Tajik-Farsi Language Pair: A Comparative Study from Rule-Based to Transformer Architectures

新研究为塔吉克-波斯语机器音译模型建立基准

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-04 06:24

本文介绍了塔吉克语和波斯语之间机器音译的新基准，并从不同来源开发了一个独特的平行语料库。该研究比较了六种模型架构，包括基于规则的系统、LSTM、Transformer 和预训练的多语言模型。结果表明，对于这种语言对，字节级和字符级模型（尤其是 ByT5）的性能明显优于 mT5 等基于子词的模型。 AI

影响强调了字节/字符级模型在特定音译任务中优于子词分词的有效性。

排序理由这是一篇研究论文，提出了一个新的基准和针对特定 NLP 任务的机器学习模型的比较研究。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CL TIER_1 English(EN) · Mullosharaf K. Arabov · 2026-05-05 04:00

A Systematic Benchmark of Machine Transliteration Models for the Tajik-Farsi Language Pair: A Comparative Study from Rule-Based to Transformer Architectures

arXiv:2605.02270v1 Announce Type: new Abstract: This paper presents the first comprehensive comparative analysis of modern machine learning architectures for transliteration between Tajik (Cyrillic script) and Persian (Arabic script). A key contribution is the creation and valida…
arXiv cs.CL TIER_1 English(EN) · Mullosharaf K. Arabov · 2026-05-04 06:24

A Systematic Benchmark of Machine Transliteration Models for the Tajik-Farsi Language Pair: A Comparative Study from Rule-Based to Transformer Architectures

This paper presents the first comprehensive comparative analysis of modern machine learning architectures for transliteration between Tajik (Cyrillic script) and Persian (Arabic script). A key contribution is the creation and validation of a unique parallel corpus aggregated from…

报道来源 [2]

A Systematic Benchmark of Machine Transliteration Models for the Tajik-Farsi Language Pair: A Comparative Study from Rule-Based to Transformer Architectures

A Systematic Benchmark of Machine Transliteration Models for the Tajik-Farsi Language Pair: A Comparative Study from Rule-Based to Transformer Architectures

相关实体

相关话题