Researchers from Zhejiang University, Xiaohongshu, and Peking University have developed SelectiveRM, a novel framework for training reward models in large language models. This method addresses the issue of noisy preference data, which is common in human and AI-generated feedback, by using optimal transport to selectively align distributions. SelectiveRM identifies and discards conflicting noisy preferences, allowing the model to learn a more reliable reward function and improve downstream reinforcement learning from human feedback (RLHF) safety. AI
IMPACT Improves LLM safety and reliability by enabling reward models to better handle noisy human feedback.
RANK_REASON The cluster describes a new research paper and framework (SelectiveRM) presented at ICML 2026, detailing a novel method for training reward models in LLMs.
- GRPO
- HarmBench
- Optimal Transport
- Peking University
- Qwen2.5
- RLHF
- SelectiveRM
- Xiaohongshu
- Zhejiang University
- LLM-as-a-Judge
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →