Researchers have developed Tatoxa, a new system designed to detect and mitigate harmful online content specifically for the Tatar language. This system demonstrates superior performance compared to existing open-source and commercial large language models when evaluated on key quality metrics. The project also includes the creation of a new Tatar text detoxification dataset for fine-tuning and evaluation, and findings indicate that cross-lingual transfer from languages like Russian is less effective than native Tatar data, even with substantial Russian corpora available. AI
IMPACT This research could improve online safety for speakers of low-resource languages by providing specialized tools for content moderation.
RANK_REASON The cluster contains an academic paper detailing a new system for text detoxification in a low-resource language. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 3 sources. How we write summaries →