PulseAugur
实时 15:48:00

Diffusion Models for Video Generation

Researchers are exploring advanced diffusion models for video generation, addressing challenges like temporal consistency and data scarcity. New methods focus on improving parameterization, such as the v-prediction technique, and incorporating conditional sampling for tasks like extending video length or filling missing frames. Efforts are also underway to enhance efficiency and controllability through post-training frameworks, hybrid attention mechanisms, and semantic-visual adaptation, aiming for real-time generation and higher quality outputs. AI

影响 Advances in diffusion models are improving video generation quality, efficiency, and controllability, potentially enabling new applications in content creation and analysis.

排序理由 Multiple arXiv papers and Hugging Face blog posts detail new research and techniques in video generation using diffusion models.

在 Lil'Log (Lilian Weng) 阅读 →

AI 生成摘要 · Google Gemini · 来自 16 个来源。 我们如何撰写摘要 →

Diffusion Models for Video Generation

报道来源 [16]

  1. Hugging Face Blog TIER_1 Dansk(DA) ·

    State of open video generation models in Diffusers

  2. Lil'Log (Lilian Weng) TIER_1 English(EN) ·

    Diffusion Models for Video Generation

    <p><a href="https://lilianweng.github.io/posts/2021-07-11-diffusion-models/">Diffusion models</a> have demonstrated strong results on image synthesis in past years. Now the research community has started working on a harder task&mdash;using it for video generation. The task itsel…

  3. Hugging Face Blog TIER_1 Deutsch(DE) ·

    Introducing Würstchen: Fast Diffusion for Image Generation

  4. arXiv cs.LG TIER_1 English(EN) · Yifan F. Zhang, Fangjun Hu, Guangkuo Liu, Mert Okyay, Xun Gao ·

    Concurrence of Symmetry Breaking and Nonlocality Phase Transitions in Diffusion Models

    arXiv:2605.04830v1 Announce Type: new Abstract: Diffusion models undergo a phase transition in a critical time window during generation dynamics, with two complementary diagnoses of criticality. The symmetry breaking picture views the critical window as when trajectories bifurcat…

  5. Hugging Face Daily Papers TIER_1 English(EN) ·

    DCMorph: Face Morphing via Dual-Stream Cross-Attention Diffusion

    Advancing face morphing attack techniques is crucial to anticipate evolving threats and develop robust defensive mechanisms for identity verification systems. This work introduces DCMorph, a dual-stream diffusion-based morphing framework that simultaneously operates at both ident…

  6. Hugging Face Daily Papers TIER_1 English(EN) ·

    AnyLift: Scaling Motion Reconstruction from Internet Videos via 2D Diffusion

    Reconstructing 3D human motion and human-object interactions (HOI) from Internet videos is a fundamental step toward building large-scale datasets of human behavior. Existing methods struggle to recover globally consistent 3D motion under dynamic cameras, especially for motion ty…

  7. arXiv cs.CV TIER_1 English(EN) · Dennis Menn, Yuedong Yang, Bokun Wang, Xiwen Wei, Mustafa Munir, Feng Liang, Radu Marculescu, Chenfeng Xu, Diana Marculescu ·

    Video Compression Meets Video Generation: Latent Inter-Frame Pruning with Attention Recovery

    arXiv:2603.05811v2 Announce Type: replace Abstract: Current video generation models suffer from high computational latency, making real-time applications prohibitively costly. In this paper, we address this limitation by exploiting the temporal redundancy inherent in video latent…

  8. arXiv cs.CV TIER_1 English(EN) · Zeyue Xue, Siming Fu, Jie Huang, Shuai Lu, Haoran Li, Yijun Liu, Yuming Li, Xiaoxuan He, Mengzhao Chen, Haoyang Huang, Nan Duan, Ping Luo ·

    A Systematic Post-Train Framework for Video Generation

    arXiv:2604.25427v1 Announce Type: new Abstract: While large-scale video diffusion models have demonstrated impressive capabilities in generating high-resolution and semantically rich content, a significant gap remains between their pretraining performance and real-world deploymen…

  9. arXiv cs.CV TIER_1 English(EN) · Ruibin Li, Tao Yang, Fangzhou Ai, Tianhe Wu, Shilei Wen, Bingyue Peng, Lei Zhang ·

    Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillation

    arXiv:2604.10103v2 Announce Type: replace Abstract: Streaming video generation (SVG) distills a pretrained bidirectional video diffusion model into an autoregressive model equipped with sliding window attention (SWA). However, SWA inevitably loses distant history during long vide…

  10. arXiv cs.CV TIER_1 English(EN) · Shuai Tan, Biao Gong, Yujie Wei, Shiwei Zhang, Zhuoxin Liu, Ke Ma, Yan Wang, Kecheng Zheng, Xing Zhu, Yujun Shen, Hengshuang Zhao ·

    SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation

    arXiv:2506.23690v2 Announce Type: replace Abstract: Diffusion-based video motion customization facilitates the acquisition of human motion representations from a few video samples, while achieving arbitrary subjects transfer through precise textual conditioning. Existing approach…

  11. arXiv cs.CV TIER_1 English(EN) · Ping Luo ·

    A Systematic Post-Train Framework for Video Generation

    While large-scale video diffusion models have demonstrated impressive capabilities in generating high-resolution and semantically rich content, a significant gap remains between their pretraining performance and real-world deployment requirements due to critical issues such as pr…

  12. arXiv cs.CV TIER_1 English(EN) · Dennis Menn, Chih-Hsien Chou ·

    Latent Inter-Frame Pruning: A Training-Free Method Bridging Traditional Video Compression and Modern Diffusion Transformers for Efficient Generation

    arXiv:2604.23858v1 Announce Type: new Abstract: Video generation, while capable of generating realistic videos, is computationally expensive and slow, prohibiting real-time applications. In this paper, we observe that video latents encoded via an autoencoder under the Latent Diff…

  13. arXiv cs.CV TIER_1 English(EN) · Haopeng Jin ·

    FreqFormer: Hierarchical Frequency-Domain Attention with Adaptive Spectral Routing for Long-Sequence Video Diffusion Transformers

    arXiv:2604.22808v1 Announce Type: new Abstract: Long-sequence video diffusion transformers hit a quadratic self-attention cost that dominates runtime and memory for very long token sequences. Most efficient attention methods use one approximation everywhere, yet video features ar…

  14. arXiv cs.CV TIER_1 English(EN) · Tristan S. W. Stevens, Ois\'in Nolan, Jean-Luc Robert, Ruud J. G. van Sloun ·

    Nuclear Diffusion Models for Low-Rank Background Suppression in Videos

    arXiv:2509.20886v2 Announce Type: replace Abstract: Video sequences often contain structured noise and background artifacts that obscure dynamic content, posing challenges for accurate analysis and restoration. Robust principal component methods address this by decomposing data i…

  15. arXiv cs.CV TIER_1 English(EN) · Naser Damer ·

    DCMorph: Face Morphing via Dual-Stream Cross-Attention Diffusion

    Advancing face morphing attack techniques is crucial to anticipate evolving threats and develop robust defensive mechanisms for identity verification systems. This work introduces DCMorph, a dual-stream diffusion-based morphing framework that simultaneously operates at both ident…

  16. arXiv cs.CV TIER_1 English(EN) · Peng Li ·

    Sparse Forcing: Native Trainable Sparse Attention for Real-time Autoregressive Diffusion Video Generation

    We introduce Sparse Forcing, a training-and-inference paradigm for autoregressive video diffusion models that improves long-horizon generation quality while reducing decoding latency. Sparse Forcing is motivated by an empirical observation in autoregressive diffusion rollouts: at…