Diffusion Transformer
PulseAugur coverage of Diffusion Transformer — every cluster mentioning Diffusion Transformer across labs, papers, and developer communities, ranked by signal.
3 天有情绪数据
-
Robotics imitation learning pipeline suffers from extreme training slowness
A user on r/MachineLearning is seeking advice regarding a significantly slow training pipeline for imitation learning in robotics. Despite using a Diffusion Transformer (DiT) model with approximately 50 million paramete…
-
PDF RAG pipelines fail due to layout; layout-aware chunking is the fix
Retrieval-Augmented Generation (RAG) pipelines often fail with PDF documents due to naive text splitting methods that ignore the document's layout. This leads to corrupted chunks containing concatenated columns, misplac…
-
Diffusion model speedup hinges on overhead reduction, not just fewer steps
Single-image diffusion model inference is slowed by kernel launch overhead and attention memory traffic, rather than raw computational power. Optimizing with `torch.compile` in `reduce-overhead` mode, employing a fused …
-
OcclusionFormer tackles image generation occlusion with new framework
Researchers have developed OcclusionFormer, a new framework designed to improve layout-grounded image generation by explicitly handling inter-object occlusion. Existing models struggle when bounding boxes overlap, leadi…
-
New dataset reveals semantic loss in VLM-based video editing
Researchers have developed a new diagnostic dataset and protocol called TRACE-Edit to evaluate how well semantic information is preserved when Vision-Language Models (VLMs) are used for video editing. Their findings ind…
-
New framework REPA-P enhances physics diffusion models without inference overhead
Researchers have developed a new framework called REPA-P to improve the accuracy and robustness of physics-informed diffusion models. This method aligns intermediate model representations with physical states during tra…
-
New methods boost video diffusion model efficiency and quality
Researchers have developed several new techniques to improve video diffusion models, focusing on efficiency and quality. One approach, LocalDPO, optimizes alignment at a localized spatio-temporal region level for better…
-
Diffusion Transformers advance image generation and material transfer
Researchers have introduced several advancements in Diffusion Transformer (DiT) architectures for image generation and manipulation. One paper explores the use of register tokens in pixel-space DiTs to improve convergen…
-
New diffusion model erases video subtitles in one step
Researchers have developed SEDiT, a novel one-stage diffusion transformer model designed for mask-free video subtitle erasure. This approach directly removes subtitles without requiring a pre-extracted mask, improving u…
-
Cold diffusion tackles percussive audio dereverberation
Researchers have developed a novel cold diffusion framework to address the challenge of dereverberating percussive audio signals, such as drums, which have been largely overlooked in favor of speech processing. This new…
-
New theory resolves instability in MeanFlow generative models
Researchers have developed a theoretical framework to address instability issues in MeanFlow training, a one-step generative modeling technique. They identified that the conditional velocity field is misused in the loss…
-
New BRIDGE method improves local image editing by controlling mask influence
Researchers have developed a new method called BRIDGE for local image editing, which aims to modify specific regions of an image while keeping the background intact. This approach tackles the issue of "mask-shape bias,"…
-
X-Cache accelerates world model inference for autonomous driving simulations
Researchers have developed X-Cache, a novel method to accelerate the inference of autoregressive world models used in autonomous driving simulations. This technique caches residual computations across generation chunks …
-
StyleShield framework evades AI content detectors with controllable style transfer
Researchers have developed StyleShield, a novel framework that manipulates text style in the continuous token embedding space to evade AI-generated content detectors. This method utilizes a DiT backbone with cross-atten…
-
Ortho-Hydra paper introduces new method to improve LoRA fine-tuning for diffusion transformers
Researchers have introduced Ortho-Hydra, a novel re-parameterization technique designed to improve LoRA fine-tuning for diffusion transformers (DiT) on multi-style data. This method addresses the issue of 'style bleed' …
-
Mamoda2.5 model integrates multimodal AI with efficient DiT-MoE for top video editing
Researchers have introduced Mamoda2.5, a unified AR-Diffusion framework designed for multimodal understanding and generation. This model utilizes a Diffusion Transformer backbone enhanced with a Mixture-of-Experts (MoE)…
-
New AI methods enhance time series forecasting accuracy and interpretability
Researchers have introduced several new methods for time-series forecasting, aiming to improve accuracy and generalization. MeLISA, a latent-free autoregressive model, enhances rollout efficiency and long-horizon statis…
-
Video Generation with Predictive Latents
Researchers have developed several new methods to improve the efficiency and quality of visual generative models. DC-DiT introduces dynamic chunking to Diffusion Transformers, adaptively compressing visual data for fast…
-
YOSE framework speeds up video object removal with token selection
Researchers have developed YOSE, a new framework designed to significantly speed up video object removal using Diffusion Transformer (DiT) models. YOSE achieves this efficiency by adaptively selecting only the essential…
-
Researchers release TripVVT dataset and framework for in-the-wild video virtual try-on
Researchers have introduced TripVVT, a new framework for in-the-wild video virtual try-on, addressing limitations caused by scarce data and improper mask usage. The system utilizes a Diffusion Transformer and a stable h…