PulseAugur
实时 02:22:46
Italiano(IT) Q-ARVD: Quantizing Autoregressive Video Diffusion Models

新方法提升视频扩散模型的效率和质量

研究人员开发了几种新技术来改进视频扩散模型,重点关注效率和质量。一种方法 LocalDPO 在局部时空区域级别优化对齐,以获得更好的视频保真度和连贯性。另一种方法 ARL2 将二次自注意力替换为固定大小的循环状态,以实现线性时间缩放和恒定的内存使用,从而加快生成速度并减少内存需求。此外,ORBIS 是一种软硬件协同设计的加速器,它使用输出激活来实现更准确的令牌间相似性,从而获得更高的令牌缩减率,并显著提高速度和降低能耗。最后,Bernini 将多模态大语言模型 (MLLMs) 与扩散模型统一起来,使用 MLLMs 进行语义规划,使用扩散模型进行像素渲染,在视频生成和编辑方面取得了最先进的性能。 AI

影响 视频扩散模型的这些进步有望实现更高效、更高质量的视频生成,可能对创意产业和人工智能驱动的内容创作产生影响。

排序理由 该集群包含多篇研究论文,详细介绍了视频扩散模型的新颖方法和架构。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 6 个来源。 我们如何撰写摘要 →

报道来源 [6]

  1. arXiv cs.AI TIER_1 English(EN) · Zitong Huang, Kaidong Zhang, Yukang Ding, Chao Gao, Rui Ding, Ying Chen, Wangmeng Zuo ·

    Mind the Generative Details: Direct Localized Detail Preference Optimization for Video Diffusion Models

    arXiv:2601.04068v4 Announce Type: replace-cross Abstract: Aligning text-to-video diffusion models with human preferences is crucial for generating high-quality videos. Existing Direct Preference Otimization (DPO) methods rely on multi-sample ranking and task-specific critic model…

  2. arXiv cs.LG TIER_1 English(EN) · Kunyang Li, Mubarak Shah, Yuzhang Shang ·

    Attend Locally, Remember Linearly: Linear Attention as Cross-Frame Memory for Autoregressive Video Diffusion

    arXiv:2605.16579v2 Announce Type: replace-cross Abstract: Autoregressive (AR) video diffusion is a powerful paradigm for streaming and interactive video generation. However, its reliance on softmax self-attention leads to quadratic compute complexity in sequence length and memory…

  3. Hugging Face Daily Papers TIER_1 Italiano(IT) ·

    Q-ARVD: Quantizing Autoregressive Video Diffusion Models

    Autoregressive video diffusion models face high inference costs that limit practical deployment, prompting the development of Q-ARVD, a novel quantization framework addressing frame-wise sensitivity imbalance and weight outlier patterns specific to these models.

  4. arXiv cs.CV TIER_1 English(EN) · Hangyeol Lee, Joo-Young Kim ·

    ORBIS: Output-Guided Token Reduction with Distribution-Aware Matching for Video Diffusion Acceleration

    arXiv:2605.22015v1 Announce Type: new Abstract: Diffusion Transformer (DiT) has emerged as a powerful model architecture for generating high-quality images and videos. In the case of video DiT, 3D Spatio-Temporal Attention increases token length in proportion to the number of fra…

  5. arXiv cs.CV TIER_1 English(EN) · Bernini Team, Chenchen Liu, Junyi Chen, Lei Li, Lu Chi, Mingzhen Sun, Zhuoying Li, Yi Fu, Ruoyu Guo, Yiheng Wu, Ge Bai, Zehuan Yuan ·

    Bernini: Latent Semantic Planning for Video Diffusion

    arXiv:2605.22344v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) and diffusion models have each reached remarkable maturity: MLLMs excel at reasoning over heterogeneous multimodal inputs with strong semantic grounding, while diffusion models synthesize ima…

  6. arXiv cs.CV TIER_1 English(EN) · Zehuan Yuan ·

    Bernini: Latent Semantic Planning for Video Diffusion

    Multimodal large language models (MLLMs) and diffusion models have each reached remarkable maturity: MLLMs excel at reasoning over heterogeneous multimodal inputs with strong semantic grounding, while diffusion models synthesize images and videos with photorealistic fidelity. We …