PulseAugur
实时 02:37:50
English(EN) Towards General Preference Alignment: Diffusion Models at Nash Equilibrium

扩散模型利用博弈论和纳什均衡实现人类偏好对齐

研究人员推出了一种新颖的框架——扩散纳什偏好优化(Diff.-NPO),用于将文本到图像的扩散模型与人类偏好对齐。该方法超越了直接偏好优化(DPO)等传统方法,从博弈论的角度构建了扩散模型对齐问题。Diff.-NPO鼓励策略通过与自身博弈来改进自身,旨在比现有模型更全面地捕捉人类偏好。 AI

影响 引入了一种博弈论方法用于扩散模型对齐,有望超越当前的DPO方法改进偏好建模。

排序理由 该集群包含一篇详细介绍扩散模型对齐新方法的学术论文。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

扩散模型利用博弈论和纳什均衡实现人类偏好对齐

报道来源 [2]

  1. arXiv cs.LG TIER_1 English(EN) · Jiaming Hu, Jiamu Bai, Haoyu Wang, Debarghya Mukherjee, Ioannis Ch. Paschalidis ·

    Towards General Preference Alignment: Diffusion Models at Nash Equilibrium

    arXiv:2605.04494v1 Announce Type: new Abstract: Reinforcement learning from human feedback (RLHF) has been popular for aligning text-to-image (T2I) diffusion models with human preferences. As a mainstream branch of RLHF, Direct Preference Optimization (DPO) offers a computational…

  2. arXiv cs.CV TIER_1 English(EN) · Ioannis Ch. Paschalidis ·

    Towards General Preference Alignment: Diffusion Models at Nash Equilibrium

    Reinforcement learning from human feedback (RLHF) has been popular for aligning text-to-image (T2I) diffusion models with human preferences. As a mainstream branch of RLHF, Direct Preference Optimization (DPO) offers a computationally efficient alternative that avoids explicit re…