Researchers have introduced KOTOX, a new dataset designed to improve the detection and detoxification of toxic language in Korean, particularly when users employ obfuscation techniques. The dataset categorizes Korean obfuscation patterns and provides transformation rules derived from real-world examples, enabling the creation of paired neutral, toxic, and obfuscated sentences. Models trained on KOTOX demonstrate enhanced performance on obfuscated text without compromising their ability to handle non-obfuscated content, marking a significant step in mitigating disguised toxic expressions in Korean language models. AI
IMPACT Enhances LLM safety by improving detection of disguised toxic language in Korean.
RANK_REASON The cluster describes a new academic paper and dataset released on arXiv. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →