A new research paper highlights a critical flaw in how hate speech datasets are annotated, specifically concerning the boundary between offensive and hateful content. The study reveals that annotator disagreement is not evenly distributed but heavily concentrated at this boundary, suggesting differing interpretations of what constitutes hate speech. When this disagreement is collapsed into a single majority-vote label, models trained on such data exhibit significantly lower accuracy on these contentious cases, often with high confidence in their incorrect predictions. The research argues that this structural issue in annotation design, rather than model architecture, is the root cause and proposes upstream interventions in the annotation process. AI
IMPACT Highlights a critical flaw in data annotation that impacts model accuracy and evaluation for sensitive content.
RANK_REASON Research paper published on arXiv detailing issues with hate speech annotation. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →