中文(ZH) 从最优传输角度训练奖励模型：让 RLHF 学会「忽略错误偏好」丨ICML 2026

SelectiveRM 框架训练奖励模型忽略嘈杂偏好

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-15 07:39

来自浙江大学、小红书和北京大学的研究人员开发了 SelectiveRM，一个用于训练大型语言模型奖励模型的新颖框架。该方法通过使用最优传输来选择性地对齐分布，解决了人类和 AI 生成反馈中常见的嘈杂偏好数据的问题。SelectiveRM 识别并丢弃冲突的嘈杂偏好，使模型能够学习更可靠的奖励函数，并提高下游人类反馈强化学习 (RLHF) 的安全性。 AI

影响通过使奖励模型能够更好地处理嘈杂的人类反馈，提高了 LLM 的安全性和可靠性。

排序理由该集群描述了一篇在 ICML 2026 上发表的新研究论文和框架 (SelectiveRM)，详细介绍了一种训练 LLM 奖励模型的新颖方法。

在雷峰网 (Leiphone) 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

雷峰网 (Leiphone) TIER_1 中文(ZH) · 2026-06-15 07:39

Training Reward Models from Optimal Transport Perspective: Enabling RLHF to Learn to 'Ignore Incorrect Preferences' | ICML 2026

<section style="text-align: center; margin: 0px 16px; line-height: 1.75em; display: block;"><img class="rich_pages wxw-img" src="https://static.leiphone.com/uploads/new/images/20260615/6a2fab1e1957c.jpg?imageMogr2/quality/90" style="width: 100%; display: inline-block; text-align:…
雷峰网 (Leiphone) TIER_1 中文(ZH) · 2026-06-15 07:39

Training Reward Models from Optimal Transport Perspective: Enabling RLHF to Learn to 'Ignore Incorrect Preferences' | ICML 2026

<section style="text-align: center; margin: 0px 16px; line-height: 1.75em; display: block;"><img class="rich_pages wxw-img" src="https://static.leiphone.com/uploads/new/images/20260615/6a2fab1e1957c.jpg?imageMogr2/quality/90" style="width: 100%; display: inline-block; text-align:…

报道来源 [2]

Training Reward Models from Optimal Transport Perspective: Enabling RLHF to Learn to 'Ignore Incorrect Preferences' | ICML 2026

Training Reward Models from Optimal Transport Perspective: Enabling RLHF to Learn to 'Ignore Incorrect Preferences' | ICML 2026

相关实体

相关话题