Researchers have introduced Direct Diffusion Score Preference Optimization (DDSPO), a novel method for training diffusion models to better align with user intent and aesthetic quality. Unlike previous approaches that relied on approximations from the forward diffusion process, DDSPO directly supervises backward denoising transitions using a contrastive policy pair. This new method can be implemented by training separate winning and losing models or by leveraging a pretrained reference model with semantic variations, eliminating the need for reward modeling or manual annotations. Empirical results indicate that DDSPO's contrastive-policy-pair supervision is more effective than existing forward-process-based methods for text-image alignment and aesthetic quality. AI
IMPACT This new training method could lead to diffusion models that better understand and execute complex user instructions, improving their utility in creative applications.
RANK_REASON The cluster contains an academic paper detailing a new method for training diffusion models. [lever_c_demoted from research: ic=1 ai=1.0]
- DDSPO
- Diffusion Direct Preference Optimization
- Diffusion models
- Direct Diffusion Score Preference Optimization
- Dohyun Kim
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →