New Korean Dataset Tackles Obfuscated Toxic Language in LLMs

By PulseAugur Editorial · [1 sources] · 2026-05-29 04:00

Researchers have introduced KOTOX, a new dataset designed to improve the detection and detoxification of toxic language in Korean, particularly when users employ obfuscation techniques. The dataset categorizes Korean obfuscation patterns and provides transformation rules derived from real-world examples, enabling the creation of paired neutral, toxic, and obfuscated sentences. Models trained on KOTOX demonstrate enhanced performance on obfuscated text without compromising their ability to handle non-obfuscated content, marking a significant step in mitigating disguised toxic expressions in Korean language models. AI

IMPACT Enhances LLM safety by improving detection of disguised toxic language in Korean.

RANK_REASON The cluster describes a new academic paper and dataset released on arXiv. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

paper
safety

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New Korean Dataset Tackles Obfuscated Toxic Language in LLMs

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Yejin Lee, Su-Hyeon Kim, Hyundong Jin, Dayoung Kim, Yeonsoo Kim, Yo-Sub Han · 2026-05-29 04:00

Obfuscation Rules for Detecting and Detoxifying Korean Toxicity

arXiv:2510.10961v3 Announce Type: replace-cross Abstract: As language models become increasingly deployed in online environments, toxicity detection and detoxification have received growing attention. Existing studies primarily focus on non-obfuscated text, which limits robustnes…

COVERAGE [1]

Obfuscation Rules for Detecting and Detoxifying Korean Toxicity

RELATED ENTITIES

RELATED TOPICS