MM-DiT
PulseAugur coverage of MM-DiT — every cluster mentioning MM-DiT across labs, papers, and developer communities, ranked by signal.
1 day(s) with sentiment data
-
New method enables text-and-image-to-image generation without retraining
Researchers have developed TF-TI2I, a novel method for text-and-image-to-image generation that adapts existing text-to-image models without requiring further training. This approach leverages the MM-DiT architecture, en…
-
New method enables open-vocabulary scene text editing with style consistency
Researchers have developed a novel self-prompting method for editing scene text in images, addressing limitations of existing approaches that neglect visual details of target regions and are constrained by pre-trained g…
-
New method improves AI portrait generation by balancing alignment, realism, and aesthetics
Researchers have developed a new method to improve human portrait generation in text-to-image diffusion models, addressing the common trade-offs between text-image alignment, realism, and aesthetics. Their approach uses…
-
Galaxy General LDA-1B model unifies diverse data for embodied AI's GPT-2 moment
Galaxy General LDA has introduced LDA-1B, a 1.6 billion parameter model designed to unify the utilization of diverse data sources for embodied AI. This model employs a novel World-Action Fusion approach, enabling it to …
-
UniSonate model unifies speech, music, and sound effect generation
Researchers have developed UniSonate, a novel unified framework for generating speech, music, and sound effects using natural language instructions. This model addresses the fragmentation in generative audio by reconcil…