A new paper proposes three distinct models for how human annotator judgments shape large language model behavior through Reinforcement Learning from Human Feedback (RLHF). These models are 'extension,' where annotators align with designers' views; 'evidence,' where annotators provide factual information; and 'authority,' where annotators represent broader societal consensus. The paper argues that RLHF pipelines should be tailored to these different roles rather than using a single unified approach. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Clarifies the normative role of human feedback in LLM alignment, potentially improving annotation strategies.
RANK_REASON Academic paper proposing new conceptual models for RLHF annotation.