Brief · PulseAugur

RESEARCH · arXiv cs.CL English(EN) · 13mo · [5 sources]

Token-weighted Direct Preference Optimization with Attention

Researchers have introduced Token-weighted Direct Preference Optimization (TwDPO), a new method for aligning large language models with human preferences. Unlike standard DPO, TwDPO assigns different importance weights to individual tokens within a response. The proposed instantiation, AttentionPO, leverages the LLM's own attention mechanisms to dynamically estimate these token weights, making the process content-aware and efficient. Experiments demonstrate that AttentionPO significantly enhances performance on benchmarks like AlpacaEval and MT-Bench compared to existing preference optimization techniques. AI

IMPACT This new method could lead to more nuanced and effective alignment of LLMs with human preferences, improving their helpfulness and safety.

AlpacaEval
ArenaHard
MT-Bench
TwDPO
AttentionPO