Dealing with Annotator Disagreement in Hate Speech Classification
Researchers have developed new methods to handle disagreements among human annotators when classifying hate speech. Their work explores various aggregation techniques, including majority voting and regression-based approaches, to better utilize the information present in these disagreements. The study demonstrates that discarding samples with non-consensus annotations leads to overly optimistic results, and that modeling annotator disagreement can improve the robustness and reliability of hate speech detection systems, even establishing new state-of-the-art results for Turkish tweets. AI
IMPACT Improves the reliability of AI systems for detecting harmful online content by better modeling human subjectivity.