中文(ZH) 从最优传输角度训练奖励模型：让 RLHF 学会「忽略错误偏好」丨ICML 2026

SelectiveRM framework trains reward models to ignore noisy preferences

By PulseAugur Editorial · [2 sources] · 2026-06-15 07:39

Researchers from Zhejiang University, Xiaohongshu, and Peking University have developed SelectiveRM, a novel framework for training reward models in large language models. This method addresses the issue of noisy preference data, which is common in human and AI-generated feedback, by using optimal transport to selectively align distributions. SelectiveRM identifies and discards conflicting noisy preferences, allowing the model to learn a more reliable reward function and improve downstream reinforcement learning from human feedback (RLHF) safety. AI

IMPACT Improves LLM safety and reliability by enabling reward models to better handle noisy human feedback.

RANK_REASON The cluster describes a new research paper and framework (SelectiveRM) presented at ICML 2026, detailing a novel method for training reward models in LLMs.

Read on 雷峰网 (Leiphone) →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

SelectiveRM framework trains reward models to ignore noisy preferences

COVERAGE [2]

雷峰网 (Leiphone) TIER_1 中文(ZH) · 2026-06-15 07:39

Training Reward Models from Optimal Transport Perspective: Enabling RLHF to Learn to 'Ignore Incorrect Preferences' | ICML 2026

<section style="text-align: center; margin: 0px 16px; line-height: 1.75em; display: block;"><img class="rich_pages wxw-img" src="https://static.leiphone.com/uploads/new/images/20260615/6a2fab1e1957c.jpg?imageMogr2/quality/90" style="width: 100%; display: inline-block; text-align:…
雷峰网 (Leiphone) TIER_1 中文(ZH) · 2026-06-15 07:39

Training Reward Models from Optimal Transport Perspective: Enabling RLHF to Learn to 'Ignore Incorrect Preferences' | ICML 2026

<section style="text-align: center; margin: 0px 16px; line-height: 1.75em; display: block;"><img class="rich_pages wxw-img" src="https://static.leiphone.com/uploads/new/images/20260615/6a2fab1e1957c.jpg?imageMogr2/quality/90" style="width: 100%; display: inline-block; text-align:…

COVERAGE [2]

Training Reward Models from Optimal Transport Perspective: Enabling RLHF to Learn to 'Ignore Incorrect Preferences' | ICML 2026

Training Reward Models from Optimal Transport Perspective: Enabling RLHF to Learn to 'Ignore Incorrect Preferences' | ICML 2026

RELATED ENTITIES

RELATED TOPICS