A new paper proposes three distinct models for understanding the role of human annotators in Reinforcement Learning from Human Feedback (RLHF) pipelines. These models are 'extension,' where annotators mirror designers' judgments; 'evidence,' where annotators provide factual information; and 'authority,' where annotators represent broader population views. The paper argues that clarifying which model is used for different annotation tasks can improve RLHF pipeline design and aggregation methods. AI
影响 Clarifies annotation models for RLHF, potentially improving alignment and safety.
排序理由 Academic paper analyzing RLHF annotation methods.
在 Hugging Face Daily Papers 阅读 →
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →