Brief · PulseAugur

TOOL · arXiv cs.CV English(EN) · 9h

Prompt Reinjection: Alleviating Prompt Forgetting in Multimodal Diffusion Transformers

Researchers have identified a "prompt forgetting" issue in Multimodal Diffusion Transformers (MMDiTs) used for text-to-image generation. This phenomenon occurs because the text prompt's semantic representation degrades as it passes through deeper layers of the model. To address this, a new training-free method called "prompt reinjection" has been proposed, which reintroduces early-layer prompt representations into later layers. Experiments on models like SD3, SD3.5, and FLUX.1 demonstrate that this technique improves instruction-following capabilities and overall generation quality. AI

IMPACT This research offers a technique to enhance the instruction-following capabilities of current text-to-image diffusion models.

SD3.5
Yuxuan Yao
Multimodal Diffusion Transformers