UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models
Researchers have introduced UDM-GRPO, a novel framework that integrates Uniform Discrete Diffusion Models (UDMs) with reinforcement learning for improved discrete generative modeling. The method enhances training stability and performance by treating the final clean sample as an action and reconstructing trajectories via the diffusion forward process. Additional strategies like Reduced-Step and CFG-Free further boost efficiency, leading to state-of-the-art results in text-to-image tasks, OCR benchmarks, and other applications. AI
IMPACT This research could lead to more stable and efficient discrete generative models, improving performance in tasks like text-to-image generation and OCR.