PulseAugur
EN
LIVE 07:54:19

New Tatoxa system enhances text detoxification for Tatar language

Researchers have developed Tatoxa, a new system designed to detect and mitigate harmful online content specifically for the Tatar language. This system demonstrates superior performance compared to existing open-source and commercial large language models when evaluated on key quality metrics. The project also includes the creation of a new Tatar text detoxification dataset for fine-tuning and evaluation, and findings indicate that cross-lingual transfer from languages like Russian is less effective than native Tatar data, even with substantial Russian corpora available. AI

IMPACT This research could improve online safety for speakers of low-resource languages by providing specialized tools for content moderation.

RANK_REASON The cluster contains an academic paper detailing a new system for text detoxification in a low-resource language. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

New Tatoxa system enhances text detoxification for Tatar language

COVERAGE [3]

  1. arXiv cs.CL TIER_1 English(EN) · Ilseyar Alimova, Bogdan Monogov, Artyom Mazur, Daniil Antonov, Vsevolod Karimov, Vitaliy Egorov, Bulat Khakimov, Alexander Panchenko ·

    The Tatoxa System for Text Detoxification in Low-Resource Languages: The Case of Tatar

    arXiv:2606.26015v1 Announce Type: new Abstract: Text detoxification, the automated detection and mitigation of abusive and harmful content, is essential for ensuring the safety of online communities and protecting users. However, low resource languages such as Tatar have received…

  2. arXiv cs.CL TIER_1 English(EN) · Alexander Panchenko ·

    The Tatoxa System for Text Detoxification in Low-Resource Languages: The Case of Tatar

    Text detoxification, the automated detection and mitigation of abusive and harmful content, is essential for ensuring the safety of online communities and protecting users. However, low resource languages such as Tatar have received little research attention. In this paper we pre…

  3. Hugging Face Daily Papers TIER_1 English(EN) ·

    The Tatoxa System for Text Detoxification in Low-Resource Languages: The Case of Tatar

    Text detoxification, the automated detection and mitigation of abusive and harmful content, is essential for ensuring the safety of online communities and protecting users. However, low resource languages such as Tatar have received little research attention. In this paper we pre…