English(EN) PhyWorld: Physics-Faithful World Model for Video Generation

新方法提升自回归视频生成质量和效率

作者 PulseAugur 编辑部 · [14 个来源] · 2026-05-15 14:33

研究人员正在开发新方法来改进自回归视频生成，重点关注效率和质量。一种名为 One-Forcing 的方法结合了 DMD 目标和 GAN 损失，实现了稳定、高质量的一步视频生成，在基准测试中优于现有的一步法。另一种技术 DySink 使用基于检索的框架和动态帧接收器来保持自适应的远程上下文，并防止长视频生成崩溃。此外，对抗性流蒸馏 (AFD) 提供了一种策略内方法，可以将异构黑盒视频生成器蒸馏成高效的自回归学生模型，而无需教师分数。 AI

影响新方法有望实现更稳定、高效、高质量的视频生成，可能为实时交互内容和世界模拟等新应用提供支持。

排序理由多篇研究论文介绍了改进自回归视频生成的新技术。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 14 个来源。我们如何撰写摘要 →

报道来源 [14]

arXiv cs.AI TIER_1 English(EN) · Jiaqi Feng, Justin Cui, Yuanhao Ban, Cho-Jui Hsieh · 2026-05-25 04:00

One-Forcing：迈向稳定的单步自回归视频生成

arXiv:2605.23458v1 Announce Type: cross Abstract: Recent advances have substantially improved real-time interactive video generation in the autoregressive regime. However, most existing few-step autoregressive video generation methods, often distilled from a corresponding many-st…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-25 00:00

面向自回归视频生成的在线策略对抗流蒸馏

Adversarial Flow Distillation enables efficient distillation of heterogeneous video generation models by using on-policy feedback and forward-process flow-matching updates without requiring teacher scores or detailed trajectory information.
arXiv cs.AI TIER_1 English(EN) · Bo Ye, Xinyu Cui, Jian Zhao, Tong Wei, Min-Ling Zhang · 2026-05-22 04:00

DySink：用于自回归长视频生成的动态帧接收器

arXiv:2605.21028v1 Announce Type: cross Abstract: Autoregressive long video generation often adopts bounded-memory streaming for efficiency, typically combining local windows for short-term continuity with static early-frame sinks as long-range anchors. However, this fixed alloca…
arXiv cs.AI TIER_1 English(EN) · Min-Ling Zhang · 2026-05-20 11:01

DySink：用于自回归长视频生成的动态帧接收器

Autoregressive long video generation often adopts bounded-memory streaming for efficiency, typically combining local windows for short-term continuity with static early-frame sinks as long-range anchors. However, this fixed allocation keeps early frames cached even when the curre…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-19 01:28

PhyWorld: 物理保真视频生成世界模型

World simulators can provide safe and scalable environments for training Physical AI systems before real-world deployment. Large video generation models are emerging as a promising basis for such simulators because they can generate diverse and realistic visual futures. However, …
arXiv cs.CV TIER_1 English(EN) · Yang Luo, Shengju Qian, Xiaohang Tang, Zirui Zhu, Yong Liu, Xin Wang, Yang You · 2026-05-26 04:00

面向自回归视频生成的在线策略对抗流蒸馏

arXiv:2605.26105v1 Announce Type: new Abstract: Autoregressive video generators are attractive for streaming, long-horizon, and interactive applications, but distilling strong black-box teachers into causal students remains difficult. The student must learn under its own rollout …
arXiv cs.CV TIER_1 English(EN) · Yang You · 2026-05-25 17:58

面向自回归视频生成的在线策略对抗流蒸馏

Autoregressive video generators are attractive for streaming, long-horizon, and interactive applications, but distilling strong black-box teachers into causal students remains difficult. The student must learn under its own rollout distribution, whereas practical teachers may exp…
arXiv cs.CV TIER_1 English(EN) · Cho-Jui Hsieh · 2026-05-22 10:16

One-Forcing：迈向稳定的单步自回归视频生成

Recent advances have substantially improved real-time interactive video generation in the autoregressive regime. However, most existing few-step autoregressive video generation methods, often distilled from a corresponding many-step teacher, default to a 4-step sampling configura…
arXiv cs.CV TIER_1 English(EN) · Hongzhou Zhu, Min Zhao, Guande He, Hang Su, Chongxuan Li, Jun Zhu · 2026-05-22 04:00

因果强制：自回归扩散蒸馏的正确实现，用于高质量实时交互式视频生成

arXiv:2602.02214v3 Announce Type: replace Abstract: To achieve real-time interactive video generation, current methods distill pretrained bidirectional video diffusion models into few-step autoregressive (AR) models, facing an architectural gap when full attention is replaced by …
arXiv cs.CV TIER_1 English(EN) · Sheng Li, Yang Sui, Junhao Ran, Bo Yuan, Yue Dai, Xulong Tang · 2026-05-22 04:00

面向高效扩散模型视频生成的时序感知剪枝

arXiv:2605.17837v2 Announce Type: replace Abstract: Video diffusion models have recently enabled high-quality video generation with ViT-based architectures, but remain computationally intensive because generation requires attention computation over long spatiotemporal sequences. …
arXiv cs.CV TIER_1 English(EN) · Linfeng Zhang · 2026-05-20 11:24

动态视频生成：塑造时空中的视频生成

Diffusion models have achieved impressive performance in video generation, but their iterative denoising process remains computationally expensive due to the large number of tokens processed at each timestep. Recently, progressive resolution sampling has emerged as a promising ac…
arXiv cs.CV TIER_1 English(EN) · Jong Chul Ye · 2026-05-20 08:55

FlowLong：通过流形约束的Tweedie匹配实现推理时长的视频生成

Extending the generation horizon of video diffusion models to long sequences remains a long-standing and important challenge. Existing training-free approaches fall into two categories: extensions of bidirectional models, which are tightly coupled to specific architectures and su…
arXiv cs.CV TIER_1 English(EN) · K. Huang · 2026-05-18 11:28

增强无训练无限帧生成以实现一致的长视频

Without incurring significant computational overhead, train-free long video generation aims to enable foundation video generation models to produce longer videos. Frame-level autoregressive frameworks, e.g., FIFO-diffusion, offer the advantage of generating infinitely long videos…
arXiv cs.CV TIER_1 English(EN) · Chuanguang Yang · 2026-05-15 14:33

Echo-Forcing：用于交互式长视频生成的场景记忆框架

Autoregressive video diffusion models enable open-ended generation through local attention and KV caching. However, existing training-free long-video optimization methods mainly focus on stable extension under a single prompt, making them difficult to handle interactive scenarios…

报道来源 [14]

相关实体

相关话题