English(EN) RhymeFlow: Training-Free Acceleration for Video Generation with Asynchronous Denoising Flow Scheduling

新研究提升视频生成控制力和效率

作者 PulseAugur 编辑部 · [22 个来源] · 2026-05-22 00:00

研究人员正在开发新方法来改进视频生成模型，重点关注控制、效率和质量。一种名为LA-LQR的方法使用最优控制来引导视频生成模型，在保持视觉保真度的同时减少不期望的内容。另一个研究领域是通过蒸馏和低比特量化来压缩大型视频扩散模型（如Wan2.2），使其更易于部署。此外，新的框架正在出现，为视频生成提供显式的3D控制和感知，超越2D投影，以更好地捕捉复杂的场景动态和人物运动。 AI

影响在控制、效率和3D感知方面的进步正在突破视频生成能力的界限。

排序理由 arXiv上发表了多篇学术论文，详细介绍了视频生成模型的新方法和框架。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 22 个来源。我们如何撰写摘要 →

报道来源 [22]

arXiv cs.AI TIER_1 English(EN) · Jihoon Hong, Alice Chan, Qiyue Dai, Julian Skifstad, Glen Chou · 2026-06-04 04:00

通过降阶线性最优控制激活视频生成模型

arXiv:2606.04775v1 Announce Type: cross Abstract: Text-to-video (T2V) models trained on large-scale web data can generate undesired content, motivating interventions that reduce harmful outputs without sacrificing visual quality. Activation steering offers an attractive mechanist…
arXiv cs.LG TIER_1 English(EN) · Glen Chou · 2026-06-03 11:58

通过降阶线性最优控制激活视频生成模型

Text-to-video (T2V) models trained on large-scale web data can generate undesired content, motivating interventions that reduce harmful outputs without sacrificing visual quality. Activation steering offers an attractive mechanistic alternative to finetuning and prompt filtering,…
arXiv cs.AI TIER_1 English(EN) · Jiayi Wu, Haoming Cai, Cornelia Fermuller, Christopher Metzler, Yiannis Aloimonos · 2026-06-02 04:00

Real2SAM2Real：生成式3D缓存作为视频扩散的补充上下文

arXiv:2606.00299v1 Announce Type: cross Abstract: While Video Diffusion Models (VDMs) excel at synthesizing high-fidelity videos, enabling precise camera and scene control remains challenging. Existing methods predominantly rely on implicit diffusion priors to generate unobserved…
arXiv cs.AI TIER_1 English(EN) · Jinyang Du, Shenghao Jin, Ziqian Xu, Ruihao Gong, Shiqiao Gu, Yang Yong, Jinyang Guo, Xianglong Liu · 2026-06-02 04:00

面向Wan2.2双专家视频扩散模型的协同少样本蒸馏与低比特量化

arXiv:2606.00658v1 Announce Type: cross Abstract: Large video diffusion models achieve strong visual quality but remain expensive to deploy because each sample requires many denoising steps and a large resident parameter footprint. This paper studies a deployment-oriented compres…
arXiv cs.AI TIER_1 English(EN) · Jingyun Liang, Min Wei, Shikai Li, Yizeng Han, Hangjie Yuan, Lei Sun, Weihua Chen, Fan Wang · 2026-06-02 04:00

迈向三维感知视频扩散模型：基于网格标记化的无渲染人体运动控制

arXiv:2606.02000v1 Announce Type: cross Abstract: Diffusion models have shown remarkable success in video generation. However, whether such models are truly aware of the 3D structure underlying visual observations, rather than simply reproducing plausible 2D projections, remains …
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-02 00:00

AAD-1：用于单步自回归视频生成的不对称对抗蒸馏

AAD-1 framework improves one-step autoregressive image-to-video generation by breaking generator-discriminator symmetry and using phased training to prevent motion collapse and training instability.
arXiv cs.AI TIER_1 English(EN) · Ruotong Liao, Guowen Huang, Qing Cheng, Guangyao Zhai, Lei Zhang, Xun Xiao, Thomas Seidl, Daniel Cremers, Volker Tresp · 2026-06-01 04:00

TunerDiT：用于多事件视频生成的无训练渐进式扩散Transformer引导

arXiv:2605.31590v1 Announce Type: cross Abstract: Text-to-video (T2V) generation faces challenging questions when generating videos with long horizons containing multiple events. Inspired by the intrinsics of the diffusion process, we probe video diffusion transformers (DiTs) and…
arXiv cs.AI TIER_1 English(EN) · Volker Tresp · 2026-05-29 17:56

TunerDiT：用于多事件视频生成的无训练渐进式扩散Transformer引导

Text-to-video (T2V) generation faces challenging questions when generating videos with long horizons containing multiple events. Inspired by the intrinsics of the diffusion process, we probe video diffusion transformers (DiTs) and uncover intrinsic turning points in the DiT denoi…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-28 00:00

VideoMLA：用于分钟级自回归视频扩散的低秩潜在 KV 缓存

VideoMLA reduces memory usage in video diffusion models by replacing per-head keys and values with shared low-rank content and decoupled 3D-RoPE positional keys, maintaining quality while achieving significant compression and improved throughput.
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-22 00:00

One-Forcing：迈向稳定的一步式自回归视频生成

One-Forcing improves one-step video generation quality and efficiency by combining DMD objective with GAN loss, achieving state-of-the-art results with reduced training costs.
arXiv cs.CV TIER_1 English(EN) · Chensheng Dai, Shengjun Zhang, Yifan Li, Zhang Zhang, Zheng Zhu, Yueqi Duan · 2026-06-05 04:00

RhymeFlow：无需训练即可加速视频生成，采用异步去噪流调度

arXiv:2606.06309v1 Announce Type: new Abstract: Video generation models based on Diffusion Transformers (DiTs) have achieved remarkable performance in video synthesis, yet they suffer from high inference latency and computational costs due to the quadratic complexity of 3D attent…
arXiv cs.CV TIER_1 English(EN) · Yueqi Duan · 2026-06-04 15:49

RhymeFlow：无需训练即可加速视频生成，采用异步去噪流调度

Video generation models based on Diffusion Transformers (DiTs) have achieved remarkable performance in video synthesis, yet they suffer from high inference latency and computational costs due to the quadratic complexity of 3D attention. Existing acceleration methods primarily red…
arXiv cs.CV TIER_1 English(EN) · Xiaoxuan He, Siming Fu, Zeyue Xue, Weijie Wang, Ruizhe He, Yuming Li, Dacheng Yin, Shuai Dong, Haoyang Huang, Hongfa Wang, Nan Duan, Bohan Zhuang · 2026-06-04 04:00

Flash-GRPO：通过单步策略优化实现视频扩散的高效对齐

arXiv:2605.15980v2 Announce Type: replace Abstract: Group Relative Policy Optimization has emerged as essential for aligning video diffusion models with human preferences, but faces a critical computational bottleneck: training a 14B parametered model typically demands hundreds o…
arXiv cs.CV TIER_1 English(EN) · Thanh-Tung Le, Yunhan Zhao, Menglei Chai, Zhengyang Shen, Zhe Cao, Danhang Tang, Xiaohui Xie, Deying Kong · 2026-06-04 04:00

DSA：动态步长分配用于快速自回归视频生成

arXiv:2606.04432v1 Announce Type: new Abstract: Video diffusion transformers have achieved state-of-the-art visual quality, but their high inference cost remains a major bottleneck for real-time applications. Recent distillation frameworks produce autoregressive video diffusion m…
arXiv cs.CV TIER_1 English(EN) · Deying Kong · 2026-06-03 04:25

DSA：动态步长分配用于快速自回归视频生成

Video diffusion transformers have achieved state-of-the-art visual quality, but their high inference cost remains a major bottleneck for real-time applications. Recent distillation frameworks produce autoregressive video diffusion models with reduced latency, yet these models sti…
arXiv cs.CV TIER_1 English(EN) · Haobo Li, Yanhong Zeng, Yunhong Lu, Jiapeng Zhu, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Yujun Shen, Zhipeng Zhang · 2026-06-03 04:00

AAD-1：用于单步自回归视频生成的不对称对抗蒸馏

arXiv:2606.03972v1 Announce Type: new Abstract: We present AAD-1, an Asymmetric Adversarial Distillation framework for One-step autoregressive image-to-video generation. State-of-the-art methods adopt adversarial distillation but suffer from motion collapse and training instabili…
arXiv cs.CV TIER_1 English(EN) · Yonghao Yu, Lang Huang, Runyi Li, Zerun Wang, Toshihiko Yamasaki · 2026-06-03 04:00

Video-Mirai：自回归视频扩散模型需要远见

arXiv:2606.03971v1 Announce Type: new Abstract: Causal video generators must predict from the past, but they need not learn only from it. In streaming autoregressive video diffusion, each emitted segment becomes a commitment that future segments must preserve. Standard training, …
arXiv cs.CV TIER_1 English(EN) · Zhipeng Zhang · 2026-06-02 17:55

AAD-1：一步式自回归视频生成的不对称对抗蒸馏

We present AAD-1, an Asymmetric Adversarial Distillation framework for One-step autoregressive image-to-video generation. State-of-the-art methods adopt adversarial distillation but suffer from motion collapse and training instability, resulting in static videos. AAD-1 addresses …
arXiv cs.CV TIER_1 English(EN) · Toshihiko Yamasaki · 2026-06-02 17:55

Video-Mirai：自回归视频扩散模型需要远见

Causal video generators must predict from the past, but they need not learn only from it. In streaming autoregressive video diffusion, each emitted segment becomes a commitment that future segments must preserve. Standard training, however, only asks each causal state to explain …
arXiv cs.CV TIER_1 English(EN) · Hovhannes Margaryan, Quentin Bammey, Christian Sandor · 2026-06-02 04:00

FlowC2S：从当前帧流向后续帧，实现快速且内存高效的视频续接

arXiv:2604.17625v2 Announce Type: replace Abstract: This paper introduces a novel methodology for generating fast and memory-efficient video continuations. Our method, dubbed FlowC2S, fine-tunes a pre-trained text-to-video flow model to learn a vector field between the current an…
arXiv cs.CV TIER_1 English(EN) · Yiming Zhao · 2026-06-02 04:00

面向大规模文本到视频扩散 Transformer 的边界保护 W8A8 HiFloat8 量化

arXiv:2606.00957v1 Announce Type: new Abstract: We present a post-training quantization (PTQ) approach for Wan2.1-T2V-14B, a 14-billion-parameter text-to-video diffusion transformer, targeting the W8A8 HiFloat8 (HiF8) format on Ascend 910B NPUs. A central challenge in quantizing …
arXiv cs.CV TIER_1 English(EN) · Min Zhao, Hongzhou Zhu, Kaiwen Zheng, Zihan Zhou, Bokai Yan, Xinyuan Li, Xiao Yang, Chongxuan Li, Jun Zhu · 2026-06-01 04:00

Causal Forcing++：可扩展的少样本自回归扩散蒸馏，用于实时交互式视频生成

arXiv:2605.15141v2 Announce Type: replace Abstract: Real-time interactive video generation requires low-latency, streaming, and controllable rollout. Existing autoregressive (AR) diffusion distillation methods have achieved strong results in the chunk-wise 4-step regime by distil…

报道来源 [22]

相关实体

相关话题