PulseAugur
实时 13:03:14
English(EN) Token-weighted Direct Preference Optimization with Attention

新的TwDPO方法利用LLM注意力实现更好的偏好对齐

研究人员推出了一种名为Token加权直接偏好优化(TwDPO)的新方法,用于将大型语言模型与人类偏好对齐。与标准的DPO不同,TwDPO为响应中的单个token分配不同的重要性权重。提出的实现AttentionPO利用LLM自身的注意力机制动态估计这些token权重,使过程具有内容感知性和效率。实验表明,与现有的偏好优化技术相比,AttentionPO在AlpacaEval和MT-Bench等基准测试中显著提高了性能。 AI

影响 这种新方法可能导致LLM与人类偏好的更细致、更有效的对齐,从而提高其有用性和安全性。

排序理由 该集群包含一篇详细介绍LLM对齐新方法的学术论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。 我们如何撰写摘要 →

新的TwDPO方法利用LLM注意力实现更好的偏好对齐

报道来源 [5]

  1. arXiv cs.AI TIER_1 English(EN) · Shangpin Peng, Weinong Wang, Zhuotao Tian, Senqiao Yang, Xing Wu, Haotian Xu, Chengquan Zhang, Takashi Isobe, Baotian Hu, Min Zhang ·

    Uni-DPO: A Unified Paradigm for Dynamic Preference Optimization of LLMs

    arXiv:2506.10054v4 Announce Type: replace-cross Abstract: Direct Preference Optimization (DPO) has emerged as a cornerstone of reinforcement learning from human feedback (RLHF) due to its simplicity and efficiency. However, existing DPO-based methods typically treat all preferenc…

  2. arXiv cs.CL TIER_1 English(EN) · Xiaobo Wang, Zixia Jia, Jiaqi Li, Qi Liu, Zilong Zheng ·

    Adaptive Preference Optimization with Uncertainty-aware Utility Anchor

    arXiv:2509.10515v1 Announce Type: cross Abstract: Offline preference optimization methods are efficient for large language models (LLMs) alignment. Direct Preference optimization (DPO)-like learning, one of the most popular approaches, stands out for its efficiency in reward mode…

  3. arXiv cs.CL TIER_1 English(EN) · Chengyu Huang, Zhuohang Li, Sheng-Yen Chou, Claire Cardie ·

    Token-weighted Direct Preference Optimization with Attention

    arXiv:2605.21883v1 Announce Type: new Abstract: Direct Preference Optimization (DPO) aligns Large Language Models with human preferences without the need for a separate reward model. However, DPO treats all tokens in responses equally, neglecting the differing importance of indiv…

  4. arXiv cs.CL TIER_1 English(EN) · Claire Cardie ·

    Token-weighted Direct Preference Optimization with Attention

    Direct Preference Optimization (DPO) aligns Large Language Models with human preferences without the need for a separate reward model. However, DPO treats all tokens in responses equally, neglecting the differing importance of individual tokens. Existing token-level PO methods co…

  5. Together AI blog TIER_1 English(EN) ·

    Direct Preference Optimization: A Technical Deep Dive

    Together AI now supports DPO fine-tuning. Learn how Direct Preference Optimization aligns language models with human preferences — with code examples and technical details.