English(EN) Diffusion Models for Video Generation

用于视频生成的扩散模型

作者 PulseAugur 编辑部 · [16 个来源] · 2023-09-13 00:00

研究人员正在探索用于视频生成的先进扩散模型，以解决时间一致性和数据稀缺性等挑战。新方法侧重于改进参数化，例如 v-prediction 技术，并结合条件采样来完成扩展视频长度或填充缺失帧等任务。同时，通过训练后框架、混合注意力机制和语义视觉适应性，也在努力提高效率和可控性，目标是实现实时生成和更高质量的输出。 AI

影响扩散模型的进步正在提高视频生成质量、效率和可控性，有可能在新内容创作和分析应用中发挥作用。

排序理由多篇 arXiv 论文和 Hugging Face 博客文章详细介绍了使用扩散模型进行视频生成的新研究和技术。

在 Lil'Log (Lilian Weng) 阅读 →

AI 生成摘要 · Google Gemini · 来自 16 个来源。我们如何撰写摘要 →

报道来源 [16]

Hugging Face Blog TIER_1 Dansk(DA) · 2025-01-27 00:00

Diffusers 中开源视频生成模型现状
Lil'Log (Lilian Weng) TIER_1 English(EN) · 2024-04-12 00:00

用于视频生成的扩散模型

<p><a href="https://lilianweng.github.io/posts/2021-07-11-diffusion-models/">Diffusion models</a> have demonstrated strong results on image synthesis in past years. Now the research community has started working on a harder task—using it for video generation. The task itsel…
Hugging Face Blog TIER_1 Deutsch(DE) · 2023-09-13 00:00

推出 Würstchen：用于图像生成的快速扩散模型
arXiv cs.LG TIER_1 English(EN) · Yifan F. Zhang, Fangjun Hu, Guangkuo Liu, Mert Okyay, Xun Gao · 2026-05-07 04:00

扩散模型中对称性破缺与非局域相变的同时发生

arXiv:2605.04830v1 Announce Type: new Abstract: Diffusion models undergo a phase transition in a critical time window during generation dynamics, with two complementary diagnoses of criticality. The symmetry breaking picture views the critical window as when trajectories bifurcat…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-04-23 12:46

DCMorph：通过双流交叉注意力扩散进行人脸变形

Advancing face morphing attack techniques is crucial to anticipate evolving threats and develop robust defensive mechanisms for identity verification systems. This work introduces DCMorph, a dual-stream diffusion-based morphing framework that simultaneously operates at both ident…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-04-20 05:15

AnyLift：通过 2D 扩散技术扩展互联网视频中的运动重建

Reconstructing 3D human motion and human-object interactions (HOI) from Internet videos is a fundamental step toward building large-scale datasets of human behavior. Existing methods struggle to recover globally consistent 3D motion under dynamic cameras, especially for motion ty…
arXiv cs.CV TIER_1 English(EN) · Dennis Menn, Yuedong Yang, Bokun Wang, Xiwen Wei, Mustafa Munir, Feng Liang, Radu Marculescu, Chenfeng Xu, Diana Marculescu · 2026-04-30 04:00

视频压缩遇上视频生成：基于注意力恢复的潜在帧间剪枝

arXiv:2603.05811v2 Announce Type: replace Abstract: Current video generation models suffer from high computational latency, making real-time applications prohibitively costly. In this paper, we address this limitation by exploiting the temporal redundancy inherent in video latent…
arXiv cs.CV TIER_1 English(EN) · Zeyue Xue, Siming Fu, Jie Huang, Shuai Lu, Haoran Li, Yijun Liu, Yuming Li, Xiaoxuan He, Mengzhao Chen, Haoyang Huang, Nan Duan, Ping Luo · 2026-04-29 04:00

面向视频生成的系统化训练后框架

arXiv:2604.25427v1 Announce Type: new Abstract: While large-scale video diffusion models have demonstrated impressive capabilities in generating high-resolution and semantically rich content, a significant gap remains between their pretraining performance and real-world deploymen…
arXiv cs.CV TIER_1 English(EN) · Ruibin Li, Tao Yang, Fangzhou Ai, Tianhe Wu, Shilei Wen, Bingyue Peng, Lei Zhang · 2026-04-29 04:00

通过混合注意力与解耦蒸馏实现长时域流式视频生成

arXiv:2604.10103v2 Announce Type: replace Abstract: Streaming video generation (SVG) distills a pretrained bidirectional video diffusion model into an autoregressive model equipped with sliding window attention (SWA). However, SWA inevitably loses distant history during long vide…
arXiv cs.CV TIER_1 English(EN) · Shuai Tan, Biao Gong, Yujie Wei, Shiwei Zhang, Zhuoxin Liu, Ke Ma, Yan Wang, Kecheng Zheng, Xing Zhu, Yujun Shen, Hengshuang Zhao · 2026-04-29 04:00

SynMotion：用于运动定制化视频生成的语义视觉适应

arXiv:2506.23690v2 Announce Type: replace Abstract: Diffusion-based video motion customization facilitates the acquisition of human motion representations from a few video samples, while achieving arbitrary subjects transfer through precise textual conditioning. Existing approach…
arXiv cs.CV TIER_1 English(EN) · Ping Luo · 2026-04-28 09:34

面向视频生成的系统化训练后框架

While large-scale video diffusion models have demonstrated impressive capabilities in generating high-resolution and semantically rich content, a significant gap remains between their pretraining performance and real-world deployment requirements due to critical issues such as pr…
arXiv cs.CV TIER_1 English(EN) · Dennis Menn, Chih-Hsien Chou · 2026-04-28 04:00

Latent Inter-Frame Pruning: A Training-Free Method Bridging Traditional Video Compression and Modern Diffusion Transformers for Efficient Generation

arXiv:2604.23858v1 Announce Type: new Abstract: Video generation, while capable of generating realistic videos, is computationally expensive and slow, prohibiting real-time applications. In this paper, we observe that video latents encoded via an autoencoder under the Latent Diff…
arXiv cs.CV TIER_1 English(EN) · Haopeng Jin · 2026-04-28 04:00

FreqFormer：用于长序列视频扩散 Transformer 的分层频域注意力与自适应频谱路由

arXiv:2604.22808v1 Announce Type: new Abstract: Long-sequence video diffusion transformers hit a quadratic self-attention cost that dominates runtime and memory for very long token sequences. Most efficient attention methods use one approximation everywhere, yet video features ar…
arXiv cs.CV TIER_1 English(EN) · Tristan S. W. Stevens, Ois\'in Nolan, Jean-Luc Robert, Ruud J. G. van Sloun · 2026-04-27 04:00

用于视频低秩背景抑制的核扩散模型

arXiv:2509.20886v2 Announce Type: replace Abstract: Video sequences often contain structured noise and background artifacts that obscure dynamic content, posing challenges for accurate analysis and restoration. Robust principal component methods address this by decomposing data i…
arXiv cs.CV TIER_1 English(EN) · Naser Damer · 2026-04-23 12:46

DCMorph：通过双流交叉注意力扩散进行人脸变形

Advancing face morphing attack techniques is crucial to anticipate evolving threats and develop robust defensive mechanisms for identity verification systems. This work introduces DCMorph, a dual-stream diffusion-based morphing framework that simultaneously operates at both ident…
arXiv cs.CV TIER_1 English(EN) · Peng Li · 2026-04-23 02:22

稀疏强制：原生可训练稀疏注意力用于实时自回归扩散视频生成

We introduce Sparse Forcing, a training-and-inference paradigm for autoregressive video diffusion models that improves long-horizon generation quality while reducing decoding latency. Sparse Forcing is motivated by an empirical observation in autoregressive diffusion rollouts: at…

报道来源 [16]

相关实体

相关话题