Composing People Together: Iterative Pose-Image Generation for Multi-Person Interaction Scenes
Researchers have developed a new method for generating realistic images of multiple people interacting, addressing limitations in current text-to-image models. Their approach uses a dual pose-image representation that integrates structural priors into diffusion transformers, allowing pose and appearance to develop together. This model enhances prompt alignment and scene diversity in complex multi-person image generation through a cross-modal alignment scheme and an iterative scene construction process. AI
IMPACT Introduces a novel technique for generating more accurate and diverse multi-person scenes, potentially improving applications in creative tools and virtual environments.