PulseAugur
实时 20:19:44
English(EN) Diffusion Models for Video Generation

用于视频生成的扩散模型

研究人员正在探索用于视频生成的先进扩散模型,以解决时间一致性和数据稀缺性等挑战。新方法侧重于改进参数化,例如 v-prediction 技术,并结合条件采样来完成扩展视频长度或填充缺失帧等任务。同时,通过训练后框架、混合注意力机制和语义视觉适应性,也在努力提高效率和可控性,目标是实现实时生成和更高质量的输出。 AI

影响 扩散模型的进步正在提高视频生成质量、效率和可控性,有可能在新内容创作和分析应用中发挥作用。

排序理由 多篇 arXiv 论文和 Hugging Face 博客文章详细介绍了使用扩散模型进行视频生成的新研究和技术。

在 Lil'Log (Lilian Weng) 阅读 →

AI 生成摘要 · Google Gemini · 来自 16 个来源。 我们如何撰写摘要 →

用于视频生成的扩散模型

报道来源 [16]

  1. Hugging Face Blog TIER_1 Dansk(DA) ·

    State of open video generation models in Diffusers

  2. Lil'Log (Lilian Weng) TIER_1 English(EN) ·

    Diffusion Models for Video Generation

    <p><a href="https://lilianweng.github.io/posts/2021-07-11-diffusion-models/">Diffusion models</a> have demonstrated strong results on image synthesis in past years. Now the research community has started working on a harder task&mdash;using it for video generation. The task itsel…

  3. Hugging Face Blog TIER_1 Deutsch(DE) ·

    Introducing Würstchen: Fast Diffusion for Image Generation

  4. arXiv cs.LG TIER_1 English(EN) · Yifan F. Zhang, Fangjun Hu, Guangkuo Liu, Mert Okyay, Xun Gao ·

    Concurrence of Symmetry Breaking and Nonlocality Phase Transitions in Diffusion Models

    arXiv:2605.04830v1 Announce Type: new Abstract: Diffusion models undergo a phase transition in a critical time window during generation dynamics, with two complementary diagnoses of criticality. The symmetry breaking picture views the critical window as when trajectories bifurcat…

  5. Hugging Face Daily Papers TIER_1 English(EN) ·

    DCMorph: Face Morphing via Dual-Stream Cross-Attention Diffusion

    Advancing face morphing attack techniques is crucial to anticipate evolving threats and develop robust defensive mechanisms for identity verification systems. This work introduces DCMorph, a dual-stream diffusion-based morphing framework that simultaneously operates at both ident…

  6. Hugging Face Daily Papers TIER_1 English(EN) ·

    AnyLift: Scaling Motion Reconstruction from Internet Videos via 2D Diffusion

    Reconstructing 3D human motion and human-object interactions (HOI) from Internet videos is a fundamental step toward building large-scale datasets of human behavior. Existing methods struggle to recover globally consistent 3D motion under dynamic cameras, especially for motion ty…

  7. arXiv cs.CV TIER_1 English(EN) · Dennis Menn, Yuedong Yang, Bokun Wang, Xiwen Wei, Mustafa Munir, Feng Liang, Radu Marculescu, Chenfeng Xu, Diana Marculescu ·

    Video Compression Meets Video Generation: Latent Inter-Frame Pruning with Attention Recovery

    arXiv:2603.05811v2 Announce Type: replace Abstract: Current video generation models suffer from high computational latency, making real-time applications prohibitively costly. In this paper, we address this limitation by exploiting the temporal redundancy inherent in video latent…

  8. arXiv cs.CV TIER_1 English(EN) · Zeyue Xue, Siming Fu, Jie Huang, Shuai Lu, Haoran Li, Yijun Liu, Yuming Li, Xiaoxuan He, Mengzhao Chen, Haoyang Huang, Nan Duan, Ping Luo ·

    A Systematic Post-Train Framework for Video Generation

    arXiv:2604.25427v1 Announce Type: new Abstract: While large-scale video diffusion models have demonstrated impressive capabilities in generating high-resolution and semantically rich content, a significant gap remains between their pretraining performance and real-world deploymen…

  9. arXiv cs.CV TIER_1 English(EN) · Ruibin Li, Tao Yang, Fangzhou Ai, Tianhe Wu, Shilei Wen, Bingyue Peng, Lei Zhang ·

    Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillation

    arXiv:2604.10103v2 Announce Type: replace Abstract: Streaming video generation (SVG) distills a pretrained bidirectional video diffusion model into an autoregressive model equipped with sliding window attention (SWA). However, SWA inevitably loses distant history during long vide…

  10. arXiv cs.CV TIER_1 English(EN) · Shuai Tan, Biao Gong, Yujie Wei, Shiwei Zhang, Zhuoxin Liu, Ke Ma, Yan Wang, Kecheng Zheng, Xing Zhu, Yujun Shen, Hengshuang Zhao ·

    SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation

    arXiv:2506.23690v2 Announce Type: replace Abstract: Diffusion-based video motion customization facilitates the acquisition of human motion representations from a few video samples, while achieving arbitrary subjects transfer through precise textual conditioning. Existing approach…

  11. arXiv cs.CV TIER_1 English(EN) · Ping Luo ·

    A Systematic Post-Train Framework for Video Generation

    While large-scale video diffusion models have demonstrated impressive capabilities in generating high-resolution and semantically rich content, a significant gap remains between their pretraining performance and real-world deployment requirements due to critical issues such as pr…

  12. arXiv cs.CV TIER_1 English(EN) · Dennis Menn, Chih-Hsien Chou ·

    Latent Inter-Frame Pruning: A Training-Free Method Bridging Traditional Video Compression and Modern Diffusion Transformers for Efficient Generation

    arXiv:2604.23858v1 Announce Type: new Abstract: Video generation, while capable of generating realistic videos, is computationally expensive and slow, prohibiting real-time applications. In this paper, we observe that video latents encoded via an autoencoder under the Latent Diff…

  13. arXiv cs.CV TIER_1 English(EN) · Haopeng Jin ·

    FreqFormer: Hierarchical Frequency-Domain Attention with Adaptive Spectral Routing for Long-Sequence Video Diffusion Transformers

    arXiv:2604.22808v1 Announce Type: new Abstract: Long-sequence video diffusion transformers hit a quadratic self-attention cost that dominates runtime and memory for very long token sequences. Most efficient attention methods use one approximation everywhere, yet video features ar…

  14. arXiv cs.CV TIER_1 English(EN) · Tristan S. W. Stevens, Ois\'in Nolan, Jean-Luc Robert, Ruud J. G. van Sloun ·

    Nuclear Diffusion Models for Low-Rank Background Suppression in Videos

    arXiv:2509.20886v2 Announce Type: replace Abstract: Video sequences often contain structured noise and background artifacts that obscure dynamic content, posing challenges for accurate analysis and restoration. Robust principal component methods address this by decomposing data i…

  15. arXiv cs.CV TIER_1 English(EN) · Naser Damer ·

    DCMorph: Face Morphing via Dual-Stream Cross-Attention Diffusion

    Advancing face morphing attack techniques is crucial to anticipate evolving threats and develop robust defensive mechanisms for identity verification systems. This work introduces DCMorph, a dual-stream diffusion-based morphing framework that simultaneously operates at both ident…

  16. arXiv cs.CV TIER_1 English(EN) · Peng Li ·

    Sparse Forcing: Native Trainable Sparse Attention for Real-time Autoregressive Diffusion Video Generation

    We introduce Sparse Forcing, a training-and-inference paradigm for autoregressive video diffusion models that improves long-horizon generation quality while reducing decoding latency. Sparse Forcing is motivated by an empirical observation in autoregressive diffusion rollouts: at…