PulseAugur
EN
LIVE 20:54:22

New framework rethinks hate speech model evaluation with human rationales

Researchers have developed a new framework to evaluate hate speech detection models, focusing on the variation in human explanations (rationales) beyond simple majority votes. The study proposes organizing classification metrics by predictive and distributional properties, and explainability metrics by plausibility, faithfulness, and complexity. Results indicate that softer representations of labels and rationales are more effective in capturing human disagreement and reasoning styles in subjective NLP tasks. AI

IMPACT Introduces a novel evaluation framework for NLP models, potentially improving the robustness and fairness of hate speech detection systems.

RANK_REASON The cluster contains an academic paper detailing a new methodology for evaluating NLP models.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New framework rethinks hate speech model evaluation with human rationales

COVERAGE [2]

  1. arXiv cs.CL TIER_1 English(EN) · Benedetta Muscato, Beiduo Chen, Gizem Gezici, Barbara Plank, Fosca Giannotti ·

    Disagreeing Rationales: Rethinking Classification and Explainability Evaluation in Hate Speech Detection

    arXiv:2605.31563v1 Announce Type: new Abstract: Human disagreement is ubiquitous and well-known in labeling. However, variation in explanations, captured through token-level human rationales, remains far less explored. At the same time, it is unclear how to best evaluate human la…

  2. arXiv cs.CL TIER_1 English(EN) · Fosca Giannotti ·

    Disagreeing Rationales: Rethinking Classification and Explainability Evaluation in Hate Speech Detection

    Human disagreement is ubiquitous and well-known in labeling. However, variation in explanations, captured through token-level human rationales, remains far less explored. At the same time, it is unclear how to best evaluate human labels and rationales -- or even how to best aggre…