PulseAugur
EN
LIVE 15:24:07

New framework FiMi-RM tackles length bias in RLHF reward models

Researchers have developed a new framework called FiMi-RM to address length bias in reward models used for Reinforcement Learning from Human Feedback (RLHF). This bias causes reward models to favor longer responses, even if they are not of higher quality. FiMi-RM works in three stages: training a standard reward model, using a lightweight model to capture non-linear length-reward relationships, and then integrating this learned bias into the reward model to decouple length from reward. Experiments show that FiMi-RM leads to a more balanced length-reward distribution and improves alignment algorithms like Direct Preference Optimization (DPO) by reducing verbosity without sacrificing performance. AI

IMPACT Addresses a key limitation in RLHF, potentially leading to more aligned and concise LLM responses.

RANK_REASON Academic paper detailing a new method for mitigating bias in RLHF reward models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New framework FiMi-RM tackles length bias in RLHF reward models

COVERAGE [1]

  1. arXiv cs.LG TIER_1 English(EN) · Kangwen Zhao, Jianfeng Cai, Jinhua Zhu, Ruopei Sun, Dongyun Xue, Wengang Zhou, Li Li, Houqiang Li ·

    Bias Fitting to Mitigate Length Bias of Reward Model in RLHF

    arXiv:2505.12843v2 Announce Type: replace Abstract: Reinforcement Learning from Human Feedback (RLHF) relies on reward models to align large language models with human preferences. However, RLHF often suffers from reward hacking, wherein policy learning exploits flaws in the trai…