Do We Still Need Humans in the Loop? Comparing Human and LLM Annotation in Active Learning for Hostility Detection
A new arXiv paper investigates the efficacy of Large Language Models (LLMs) in annotating data for active learning, specifically for hostility detection in online comments. The study found that LLMs, particularly GPT-5.2 with a two-question interface, can label data at a significantly lower cost than human annotators, achieving comparable or superior performance. However, the research also noted that active learning did not provide a reliable advantage over random sampling when using LLM annotators, and the error structures of different LLMs varied, with some misclassifying economic or border-control discourse. AI
IMPACT LLM annotation offers a cost-effective alternative to human labeling for specific tasks, potentially accelerating data annotation for AI development.