Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 15h · [2 sources]

Do Understanding and Generation Fight? A Diagnostic Study of DPO for Unified Multimodal Models

A recent study on unified multimodal models found that Direct Preference Optimization (DPO) struggles to simultaneously improve both image understanding and generation capabilities. The research indicated that generation quality resisted DPO alignment, with one model showing degraded generation performance and another exhibiting near-orthogonal gradients between understanding and generation tasks. This interference is attributed to a significant imbalance in token magnitudes, suggesting discrete VQ tokenization as a potential bottleneck for unified models. AI

IMPACT Findings suggest current alignment methods may not effectively improve both understanding and generation in unified multimodal models, potentially impacting future model development.

Unified multimodal models
Direct Preference Optimization (DPO)
Abinav Rao