PulseAugur
实时 09:29:20
English(EN) The Tatoxa System for Text Detoxification in Low-Resource Languages: The Case of Tatar

新的Tatoxa系统提升了鞑靼语文本的去毒化能力

研究人员开发了Tatoxa,一个专门用于检测和减轻鞑靼语在线有害内容的系统。与现有的开源和商业大型语言模型相比,该系统在关键质量指标评估中表现更优。该项目还包括创建一个新的鞑靼语文本去毒化数据集用于微调和评估,研究结果表明,即使有大量的俄语语料库可用,来自俄语等语言的跨语言迁移效果也不如原生的鞑靼语数据。 AI

影响 这项研究通过提供专门的内容审核工具,有望改善低资源语言使用者的在线安全。

排序理由 该集群包含一篇学术论文,详细介绍了一个用于低资源语言文本去毒化的新系统。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

新的Tatoxa系统提升了鞑靼语文本的去毒化能力

报道来源 [3]

  1. arXiv cs.CL TIER_1 English(EN) · Ilseyar Alimova, Bogdan Monogov, Artyom Mazur, Daniil Antonov, Vsevolod Karimov, Vitaliy Egorov, Bulat Khakimov, Alexander Panchenko ·

    The Tatoxa System for Text Detoxification in Low-Resource Languages: The Case of Tatar

    arXiv:2606.26015v1 Announce Type: new Abstract: Text detoxification, the automated detection and mitigation of abusive and harmful content, is essential for ensuring the safety of online communities and protecting users. However, low resource languages such as Tatar have received…

  2. arXiv cs.CL TIER_1 English(EN) · Alexander Panchenko ·

    The Tatoxa System for Text Detoxification in Low-Resource Languages: The Case of Tatar

    Text detoxification, the automated detection and mitigation of abusive and harmful content, is essential for ensuring the safety of online communities and protecting users. However, low resource languages such as Tatar have received little research attention. In this paper we pre…

  3. Hugging Face Daily Papers TIER_1 English(EN) ·

    The Tatoxa System for Text Detoxification in Low-Resource Languages: The Case of Tatar

    Text detoxification, the automated detection and mitigation of abusive and harmful content, is essential for ensuring the safety of online communities and protecting users. However, low resource languages such as Tatar have received little research attention. In this paper we pre…