A new paper analyzes annotation variation in NLP datasets, focusing on harmful language detection. The research combines annotator characteristics with linguistic properties of the data to understand labeling discrepancies. Findings indicate that interactions between annotator traits and item features, particularly lexical cues and annotator attitudes, are crucial, but patterns vary significantly across different datasets, cautioning against overgeneralization. AI
IMPACT Highlights the importance of considering both annotator and data characteristics for reliable NLP model training.
RANK_REASON The cluster contains an academic paper published on arXiv.
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →