English(EN) Do Understanding and Generation Fight? A Diagnostic Study of DPO for Unified Multimodal Models

研究发现DPO难以统一多模态模型的理解与生成

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-26 04:00

一项关于统一多模态模型的最新研究发现，直接偏好优化（DPO）在同时提升图像理解和生成能力方面存在困难。研究表明，生成质量难以通过DPO进行对齐，其中一个模型表现出生成性能下降，而另一个模型则在理解和生成任务之间表现出近乎正交的梯度。这种干扰归因于token幅度存在显著不平衡，表明离散的VQ分词可能是统一模型的潜在瓶颈。 AI

影响研究结果表明，当前的对齐方法可能无法有效提升统一多模态模型中的理解和生成能力，这可能会影响未来的模型开发。

排序理由该聚类包含两篇讨论改进统一多模态模型的学术论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Abinav Rao, Sujan Rachuri · 2026-05-26 04:00

Do Understanding and Generation Fight? A Diagnostic Study of DPO for Unified Multimodal Models

arXiv:2603.17044v2 Announce Type: replace-cross Abstract: Unified multimodal models share a language model backbone for both understanding and generating images. Can DPO align both capabilities simultaneously? We present the first systematic study of this question, applying DPO t…
arXiv cs.LG TIER_1 English(EN) · Zihan Su, Hongyang Wei, Kangrui Cen, Yong Wang, Guanhua Chen, Chun Yuan, Xiangxiang Chu · 2026-05-26 04:00

Generation Enhances Understanding in Unified Multimodal Models via Multi-Representation Generation

arXiv:2601.21406v3 Announce Type: replace-cross Abstract: Unified Multimodal Models (UMMs) integrate both visual understanding and generation within a single framework. Their ultimate aspiration is to create a cycle where understanding and generation mutually reinforce each other…

报道来源 [2]

Do Understanding and Generation Fight? A Diagnostic Study of DPO for Unified Multimodal Models

Generation Enhances Understanding in Unified Multimodal Models via Multi-Representation Generation

相关实体

相关话题