Brief · PulseAugur

RESEARCH · arXiv cs.LG English(EN) · 3d · [2 sources]

PAWS: Preference Learning with Advantage-Weighted Segments

Researchers have introduced PAWS, a novel method for preference-based reinforcement learning that addresses a critical training-inference mismatch. By utilizing segment-level advantage functions for policy updates, PAWS aligns utility training with optimization, preserving preference information and avoiding unreliable per-step signals. Experiments on robotic manipulation and locomotion tasks show PAWS outperforming existing approaches, underscoring the significance of distribution-consistent preference learning. AI

IMPACT Enhances reinforcement learning by improving temporal credit assignment and policy optimization through distribution-consistent preference learning.

PAWS
Aleksandar Taranovic