Diffusion Models for Video Generation

By PulseAugur Editorial · Summary by None from 16 sources

Researchers are exploring advanced diffusion models for video generation, addressing challenges like temporal consistency and data scarcity. New methods focus on improving parameterization, such as the v-prediction technique, and incorporating conditional sampling for tasks like extending video length or filling missing frames. Efforts are also underway to enhance efficiency and controllability through post-training frameworks, hybrid attention mechanisms, and semantic-visual adaptation, aiming for real-time generation and higher quality outputs. AI

Summary written by None from 16 sources. How we write summaries →

IMPACT Advances in diffusion models are improving video generation quality, efficiency, and controllability, potentially enabling new applications in content creation and analysis.

RANK_REASON Multiple arXiv papers and Hugging Face blog posts detail new research and techniques in video generation using diffusion models.

Read on Lil'Log (Lilian Weng) →

COVERAGE [16]

Hugging Face Blog TIER_1 Dansk(DA) · 2025-01-27 00:00

State of open video generation models in Diffusers
Lil'Log (Lilian Weng) TIER_1 · 2024-04-12 00:00

Diffusion Models for Video Generation

<p><a href="https://lilianweng.github.io/posts/2021-07-11-diffusion-models/">Diffusion models</a> have demonstrated strong results on image synthesis in past years. Now the research community has started working on a harder task—using it for video generation. The task itsel…
Hugging Face Blog TIER_1 Deutsch(DE) · 2023-09-13 00:00

Introducing Würstchen: Fast Diffusion for Image Generation
arXiv cs.LG TIER_1 · Yifan F. Zhang, Fangjun Hu, Guangkuo Liu, Mert Okyay, Xun Gao · 2026-05-07 04:00

Concurrence of Symmetry Breaking and Nonlocality Phase Transitions in Diffusion Models

arXiv:2605.04830v1 Announce Type: new Abstract: Diffusion models undergo a phase transition in a critical time window during generation dynamics, with two complementary diagnoses of criticality. The symmetry breaking picture views the critical window as when trajectories bifurcat…
Hugging Face Daily Papers TIER_1 · 2026-04-23 12:46

DCMorph: Face Morphing via Dual-Stream Cross-Attention Diffusion

Advancing face morphing attack techniques is crucial to anticipate evolving threats and develop robust defensive mechanisms for identity verification systems. This work introduces DCMorph, a dual-stream diffusion-based morphing framework that simultaneously operates at both ident…
Hugging Face Daily Papers TIER_1 · 2026-04-20 05:15

AnyLift: Scaling Motion Reconstruction from Internet Videos via 2D Diffusion

Reconstructing 3D human motion and human-object interactions (HOI) from Internet videos is a fundamental step toward building large-scale datasets of human behavior. Existing methods struggle to recover globally consistent 3D motion under dynamic cameras, especially for motion ty…
arXiv cs.CV TIER_1 · Dennis Menn, Yuedong Yang, Bokun Wang, Xiwen Wei, Mustafa Munir, Feng Liang, Radu Marculescu, Chenfeng Xu, Diana Marculescu · 2026-04-30 04:00

Video Compression Meets Video Generation: Latent Inter-Frame Pruning with Attention Recovery

arXiv:2603.05811v2 Announce Type: replace Abstract: Current video generation models suffer from high computational latency, making real-time applications prohibitively costly. In this paper, we address this limitation by exploiting the temporal redundancy inherent in video latent…
arXiv cs.CV TIER_1 · Zeyue Xue, Siming Fu, Jie Huang, Shuai Lu, Haoran Li, Yijun Liu, Yuming Li, Xiaoxuan He, Mengzhao Chen, Haoyang Huang, Nan Duan, Ping Luo · 2026-04-29 04:00

A Systematic Post-Train Framework for Video Generation

arXiv:2604.25427v1 Announce Type: new Abstract: While large-scale video diffusion models have demonstrated impressive capabilities in generating high-resolution and semantically rich content, a significant gap remains between their pretraining performance and real-world deploymen…
arXiv cs.CV TIER_1 · Ruibin Li, Tao Yang, Fangzhou Ai, Tianhe Wu, Shilei Wen, Bingyue Peng, Lei Zhang · 2026-04-29 04:00

Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillation

arXiv:2604.10103v2 Announce Type: replace Abstract: Streaming video generation (SVG) distills a pretrained bidirectional video diffusion model into an autoregressive model equipped with sliding window attention (SWA). However, SWA inevitably loses distant history during long vide…
arXiv cs.CV TIER_1 · Shuai Tan, Biao Gong, Yujie Wei, Shiwei Zhang, Zhuoxin Liu, Ke Ma, Yan Wang, Kecheng Zheng, Xing Zhu, Yujun Shen, Hengshuang Zhao · 2026-04-29 04:00

SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation

arXiv:2506.23690v2 Announce Type: replace Abstract: Diffusion-based video motion customization facilitates the acquisition of human motion representations from a few video samples, while achieving arbitrary subjects transfer through precise textual conditioning. Existing approach…
arXiv cs.CV TIER_1 · Ping Luo · 2026-04-28 09:34

A Systematic Post-Train Framework for Video Generation

While large-scale video diffusion models have demonstrated impressive capabilities in generating high-resolution and semantically rich content, a significant gap remains between their pretraining performance and real-world deployment requirements due to critical issues such as pr…
arXiv cs.CV TIER_1 · Dennis Menn, Chih-Hsien Chou · 2026-04-28 04:00

Latent Inter-Frame Pruning: A Training-Free Method Bridging Traditional Video Compression and Modern Diffusion Transformers for Efficient Generation

arXiv:2604.23858v1 Announce Type: new Abstract: Video generation, while capable of generating realistic videos, is computationally expensive and slow, prohibiting real-time applications. In this paper, we observe that video latents encoded via an autoencoder under the Latent Diff…
arXiv cs.CV TIER_1 · Haopeng Jin · 2026-04-28 04:00

FreqFormer: Hierarchical Frequency-Domain Attention with Adaptive Spectral Routing for Long-Sequence Video Diffusion Transformers

arXiv:2604.22808v1 Announce Type: new Abstract: Long-sequence video diffusion transformers hit a quadratic self-attention cost that dominates runtime and memory for very long token sequences. Most efficient attention methods use one approximation everywhere, yet video features ar…
arXiv cs.CV TIER_1 · Tristan S. W. Stevens, Ois\'in Nolan, Jean-Luc Robert, Ruud J. G. van Sloun · 2026-04-27 04:00

Nuclear Diffusion Models for Low-Rank Background Suppression in Videos

arXiv:2509.20886v2 Announce Type: replace Abstract: Video sequences often contain structured noise and background artifacts that obscure dynamic content, posing challenges for accurate analysis and restoration. Robust principal component methods address this by decomposing data i…
arXiv cs.CV TIER_1 · Naser Damer · 2026-04-23 12:46

DCMorph: Face Morphing via Dual-Stream Cross-Attention Diffusion

Advancing face morphing attack techniques is crucial to anticipate evolving threats and develop robust defensive mechanisms for identity verification systems. This work introduces DCMorph, a dual-stream diffusion-based morphing framework that simultaneously operates at both ident…
arXiv cs.CV TIER_1 · Peng Li · 2026-04-23 02:22

Sparse Forcing: Native Trainable Sparse Attention for Real-time Autoregressive Diffusion Video Generation

We introduce Sparse Forcing, a training-and-inference paradigm for autoregressive video diffusion models that improves long-horizon generation quality while reducing decoding latency. Sparse Forcing is motivated by an empirical observation in autoregressive diffusion rollouts: at…

COVERAGE [16]

RELATED ENTITIES

RELATED TOPICS