Annotation quality drops over time, GPT-5 leads sentiment classification

By PulseAugur Editorial · [1 sources] · 2026-05-26 16:21

A new study on sentiment analysis in Setswana tweets reveals that annotation quality significantly declines over time, with inter-annotator agreement dropping substantially when tweets are labeled days apart compared to those labeled within a minute. The research found that temporal simultaneity, not annotation speed or linguistic features, was the strongest predictor of agreement. The study also evaluated several language models, finding that GPT-5 achieved the highest macro-F1 score in few-shot sentiment classification. AI

IMPACT Highlights the challenge of maintaining annotation quality in NLP tasks and benchmarks LLM performance on sentiment analysis.

RANK_REASON The cluster contains an academic paper detailing research findings on annotation quality and model performance. [lever_c_demoted from research: ic=1 ai=1.0]

Read on Hugging Face Daily Papers →

Gemini
GPT-5

paper
safety

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-26 16:21

Temporal Simultaneity Predicts Annotation Quality in Sentiment Corpora

Annotation quality is difficult to sustain when campaigns span weeks or months with small annotator pools. We present a Setswana sentiment dataset of 3,565 tweets annotated by three native-speaker annotators across eight batches and examine why inter-annotator agreement (IAA) dec…

COVERAGE [1]

Temporal Simultaneity Predicts Annotation Quality in Sentiment Corpora

RELATED ENTITIES

RELATED TOPICS