Brief · PulseAugur

TOOL · arXiv cs.LG English(EN) · 7h

When Surface Form Changes Moderation Decisions: A Paired Study of Code-Mixed Workflow Instability

A new study published on arXiv explores the impact of code-mixed language on hate speech moderation systems. Researchers found that when content is expressed in a mix of English and Tamil, moderation systems exhibit significant instability, leading to a 26.5% rate of decision flips compared to clean English inputs. This instability results in an increased review burden and a higher rate of falsely flagging non-hateful content. The study suggests that current evaluation methods focusing solely on clean English inputs fail to capture these critical workflow failures. AI

IMPACT Highlights critical failures in AI moderation systems when encountering non-standard language, potentially impacting real-world content filtering.

arXiv
Suraj Babu Thimma Krishnaram