Diffusion Transformer
PulseAugur coverage of Diffusion Transformer — every cluster mentioning Diffusion Transformer across labs, papers, and developer communities, ranked by signal.
19 day(s) with sentiment data
-
Alibaba releases open-source Wan 2.1 video generation suite
Alibaba's Wan team has released Wan 2.1, an open-source video generation model suite that aims to make high-quality video generation more accessible. The suite includes capabilities for text-to-video, image-to-video, an…
-
RayPE encoding boosts 3D awareness in video generation models
Researchers have developed RayPE, a novel positional encoding method for video diffusion transformers that enhances 3D awareness. Unlike existing methods that use camera grid coordinates, RayPE incorporates 6D Plucker c…
-
New Diffusion Transformer framework enhances pattern-preserving attribute retrieval
Researchers have introduced a novel framework called MO-DiT+HPPO for pattern-preserving attribute retrieval. This method uses a diffusion transformer to generate query embeddings that satisfy specific attributes while m…
-
New CCUA method boosts AI image generation for rare classes
Researchers have developed a new method called Contrastive Conditional-Unconditional Alignment (CCUA) to improve the quality and diversity of images generated by diffusion models, particularly for classes with limited t…
-
PhysiFormer uses coordinate-space diffusion for physically-plausible 3D object motion simulation · 3 sources tracked
Researchers have developed PhysiFormer, a novel diffusion transformer capable of simulating physically plausible 3D object motions. Unlike previous methods that operate in pixel space, PhysiFormer works directly with 3D…
-
PID method enhances Krea 2 image generation quality
A new method called PID (Pixel Diffusion) has been developed to improve image generation quality in Krea 2, which previously used a suboptimal VAE. PID decodes images directly in pixel space, bypassing the VAE to enhanc…
-
TryOnCrafter framework enables camera-controllable video virtual try-on · 3 sources tracked
Researchers have introduced TryOnCrafter, a novel framework for camera-controllable video virtual try-on. This system moves beyond existing methods by decoupling human subjects from their environments using a renderable…
-
DiffusionBench benchmark and NanoGen framework challenge image generation evaluation
Researchers have introduced DiffusionBench, a new benchmark designed to holistically evaluate diffusion transformers (DiTs) used in image generation. The benchmark highlights that current evaluation methods, primarily f…
-
New methods enhance text-to-image generation with improved rewards and simplified models
Researchers have developed new methods for improving text-to-image generation models. DiT-Reward, a novel approach, leverages pretrained Diffusion Transformers to create reward models that outperform existing methods on…
-
Vera layered diffusion model enhances video editing with content preservation
Researchers have introduced Vera, a novel layered diffusion framework designed for content-preserving video editing. Unlike existing methods that regenerate entire videos, Vera focuses on generating an edit layer and an…
-
New SteerVTE framework enables precise video text editing
Researchers have introduced SteerVTE, a novel framework designed for precise text editing within videos. This system leverages a frozen video diffusion model, enhanced by a lightweight adapter that captures the original…
-
New AI frameworks enhance video editing with content preservation and real-time capabilities
Researchers have developed new frameworks for video editing, addressing limitations in current automated systems. VideoAgent offers an all-in-one solution for diverse video comprehension and editing tasks, utilizing a m…
-
MeshFlow generates triangle meshes 18x faster using equivariant flow matching · 2 sources tracked
Researchers have developed MeshFlow, a novel method for generating triangle meshes using equivariant optimal-transport flow matching models. This approach directly models triangle soups, respecting symmetries like verte…
-
New Delta-Diffusion model synthesizes longitudinal brain amyloid-PET data
Researchers have developed Delta-Diffusion, a new framework for synthesizing longitudinal brain amyloid-PET imaging data. This method uses a conditional Poisson Diffusion Bridge process, anchored to a subject's baseline…
-
TetriServe system improves DiT model serving efficiency
Researchers have developed TetriServe, a novel system designed to efficiently serve Diffusion Transformer (DiT) models, which are computationally intensive for image generation. Traditional serving methods struggle with…
-
EndoCoT framework enhances diffusion models' reasoning with MLLMs
Researchers have introduced EndoCoT, a new framework designed to enhance the reasoning capabilities of diffusion models when integrated with Multimodal Large Language Models (MLLMs). The framework addresses limitations …
-
Hybrid Diffusion Transformer Enhances Instruction-Guided Audio Editing
Researchers have developed a novel hybrid diffusion transformer architecture for instruction-guided audio editing. This two-stage approach, based on rectified flow matching, aims to improve both the accuracy and efficie…
-
Krea.ai releases Krea 2 text-to-image models
Krea.ai, Inc. has released two new text-to-image diffusion models, Krea 2 Raw and Krea 2 Turbo, both featuring a Diffusion Transformer architecture with 12 billion parameters. The Raw version is intended as a base for f…
-
Ghost Attractor Networks offer efficient sequential generation with stable latent structures
Researchers have introduced Ghost Attractor Networks (GANs), a novel dynamical decoder designed to improve sequential generation efficiency and control in large-scale models. GANs utilize a learned potential with a basi…
-
Spotlight system cuts DiT RL post-training costs using spot GPUs
Researchers have developed Spotlight, a novel system designed to significantly reduce the cost of post-training Diffusion Transformers (DiTs) for reinforcement learning. By leveraging insights into exploration tolerance…