A new paper proposes three distinct models for understanding the role of human annotators in Reinforcement Learning from Human Feedback (RLHF) pipelines. These models are 'extension,' where annotators mirror designers' judgments; 'evidence,' where annotators provide factual information; and 'authority,' where annotators represent broader population views. The paper argues that clarifying which model is used for different annotation tasks can improve RLHF pipeline design and aggregation methods. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Clarifies annotation models for RLHF, potentially improving alignment and safety.
RANK_REASON Academic paper analyzing RLHF annotation methods.