A new study on sentiment analysis in Setswana tweets reveals that annotation quality significantly declines over time, with inter-annotator agreement dropping substantially when tweets are labeled days apart compared to those labeled within a minute. The research found that temporal simultaneity, not annotation speed or linguistic features, was the strongest predictor of agreement. The study also evaluated several language models, finding that GPT-5 achieved the highest macro-F1 score in few-shot sentiment classification. AI
IMPACT Highlights the challenge of maintaining annotation quality in NLP tasks and benchmarks LLM performance on sentiment analysis.
RANK_REASON The cluster contains an academic paper detailing research findings on annotation quality and model performance. [lever_c_demoted from research: ic=1 ai=1.0]
Read on Hugging Face Daily Papers →
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →