PulseAugur
EN
LIVE 10:31:17

New method combats prompt forgetting in text-to-image models

Researchers have identified a "prompt forgetting" issue in Multimodal Diffusion Transformers (MMDiTs) used for text-to-image generation. This phenomenon occurs because the text prompt's semantic representation degrades as it passes through deeper layers of the model. To address this, a new training-free method called "prompt reinjection" has been proposed, which reintroduces early-layer prompt representations into later layers. Experiments on models like SD3, SD3.5, and FLUX.1 demonstrate that this technique improves instruction-following capabilities and overall generation quality. AI

IMPACT This research offers a technique to enhance the instruction-following capabilities of current text-to-image diffusion models.

RANK_REASON The cluster contains an academic paper detailing a new method for improving existing models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.CV TIER_1 English(EN) · Yuxuan Yao, Yuxuan Chen, Hui Li, Kaihui Cheng, Qipeng Guo, Yuwei Sun, Zilong Dong, Jingdong Wang, Siyu Zhu ·

    Prompt Reinjection: Alleviating Prompt Forgetting in Multimodal Diffusion Transformers

    arXiv:2602.06886v4 Announce Type: replace Abstract: Multimodal Diffusion Transformers (MMDiTs) for text-to-image generation maintain separate text and image branches, with bidirectional information flow between text tokens and visual latents throughout denoising. In this setting,…