New framework rethinks hate speech model evaluation with human rationales

By PulseAugur Editorial · [2 sources] · 2026-05-29 17:29

Researchers have developed a new framework to evaluate hate speech detection models, focusing on the variation in human explanations (rationales) beyond simple majority votes. The study proposes organizing classification metrics by predictive and distributional properties, and explainability metrics by plausibility, faithfulness, and complexity. Results indicate that softer representations of labels and rationales are more effective in capturing human disagreement and reasoning styles in subjective NLP tasks. AI

IMPACT Introduces a novel evaluation framework for NLP models, potentially improving the robustness and fairness of hate speech detection systems.

RANK_REASON The cluster contains an academic paper detailing a new methodology for evaluating NLP models.

Read on arXiv cs.CL →

paper
safety

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New framework rethinks hate speech model evaluation with human rationales

COVERAGE [2]

arXiv cs.CL TIER_1 English(EN) · Benedetta Muscato, Beiduo Chen, Gizem Gezici, Barbara Plank, Fosca Giannotti · 2026-06-01 04:00

Disagreeing Rationales: Rethinking Classification and Explainability Evaluation in Hate Speech Detection

arXiv:2605.31563v1 Announce Type: new Abstract: Human disagreement is ubiquitous and well-known in labeling. However, variation in explanations, captured through token-level human rationales, remains far less explored. At the same time, it is unclear how to best evaluate human la…
arXiv cs.CL TIER_1 English(EN) · Fosca Giannotti · 2026-05-29 17:29

Disagreeing Rationales: Rethinking Classification and Explainability Evaluation in Hate Speech Detection

Human disagreement is ubiquitous and well-known in labeling. However, variation in explanations, captured through token-level human rationales, remains far less explored. At the same time, it is unclear how to best evaluate human labels and rationales -- or even how to best aggre…

COVERAGE [2]

Disagreeing Rationales: Rethinking Classification and Explainability Evaluation in Hate Speech Detection

Disagreeing Rationales: Rethinking Classification and Explainability Evaluation in Hate Speech Detection

RELATED ENTITIES

RELATED TOPICS