PulseAugur
EN
LIVE 14:37:36

Annotation quality drops over time, GPT-5 leads sentiment classification

A new study on sentiment analysis in Setswana tweets reveals that annotation quality significantly declines over time, with inter-annotator agreement dropping substantially when tweets are labeled days apart compared to those labeled within a minute. The research found that temporal simultaneity, not annotation speed or linguistic features, was the strongest predictor of agreement. The study also evaluated several language models, finding that GPT-5 achieved the highest macro-F1 score in few-shot sentiment classification. AI

IMPACT Highlights the challenge of maintaining annotation quality in NLP tasks and benchmarks LLM performance on sentiment analysis.

RANK_REASON The cluster contains an academic paper detailing research findings on annotation quality and model performance. [lever_c_demoted from research: ic=1 ai=1.0]

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    Temporal Simultaneity Predicts Annotation Quality in Sentiment Corpora

    Annotation quality is difficult to sustain when campaigns span weeks or months with small annotator pools. We present a Setswana sentiment dataset of 3,565 tweets annotated by three native-speaker annotators across eight batches and examine why inter-annotator agreement (IAA) dec…