Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 8h

When Distance Distracts: Representation Distance Bias in BT-Loss for Reward Models

Researchers have identified a bias in the Bradley-Terry (BT) loss function commonly used for training reward models in LLM alignment. This bias stems from representation distance, where pairs of responses with large distances receive disproportionately strong updates, potentially overshadowing crucial fine-grained distinctions. To address this, the paper proposes NormBT, an adaptive normalization scheme that re-scales updates to better balance learning signals and improve reward model performance, showing over 5% gains on the RewardBench dataset. AI

IMPACT Improves fine-grained distinctions in LLM alignment, potentially leading to more nuanced and reliable AI behavior.

RewardBench
Bradley-Terry (BT) loss
NormBT