A recent study on unified multimodal models found that Direct Preference Optimization (DPO) struggles to simultaneously improve both image understanding and generation capabilities. The research indicated that generation quality resisted DPO alignment, with one model showing degraded generation performance and another exhibiting near-orthogonal gradients between understanding and generation tasks. This interference is attributed to a significant imbalance in token magnitudes, suggesting discrete VQ tokenization as a potential bottleneck for unified models. AI
影响 Findings suggest current alignment methods may not effectively improve both understanding and generation in unified multimodal models, potentially impacting future model development.
排序理由 The cluster contains two academic papers discussing methods for improving unified multimodal models.
AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →