The Ghost Annotator: a Framework to Explore Human Label Variation in Content Moderation through Conformal Prediction
Researchers have developed a new framework called the Ghost Annotator to analyze human label variation in content moderation tasks, particularly when LLMs are used for annotation. This framework combines conformal prediction with collaborative filtering to model LLM behavior against human annotators, identifying instances where model predictions diverge from human consensus. The study found that larger LLMs tend to be more confident in classifying content that doesn't align with any human annotation, and revealed a consistent pattern of demographic misalignment, suggesting biases in pretraining data. AI
IMPACT This framework could help identify and mitigate biases in LLMs used for content moderation, leading to fairer and more reliable AI systems.