新方法指导视频模型以实现更好的构图

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-14 15:50

研究人员推出了一种名为 CVG 的新方法，以增强文本到视频扩散模型的构图理解能力。该技术在推理时运行，通过利用模型内部的交叉注意力图来指导去噪过程。通过在这些注意力特征上训练一个轻量级分类器，CVG 可以引导视频生成朝着所需的构图方向发展，而无需更改底层模型架构或用户提供的控件。实验表明，在构图基准测试中，提示的忠实度和视觉质量得到了提高。 AI

影响增强了文本到视频模型中的构图理解能力，有可能提高真实感和对复杂提示的遵循程度。

排序理由学术论文，介绍了一种改进现有模型的新方法。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

text-to-video diffusion models

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 Italiano(IT) · Lior Wolf · 2026-05-14 15:50

Compositional Video Generation via Inference-Time Guidance

Text-to-video diffusion models generate realistic videos, but often fail on prompts requiring fine-grained compositional understanding, such as relations between entities, attributes, actions, and motion directions. We hypothesize that these failures need not be addressed by retr…

报道来源 [1]

Compositional Video Generation via Inference-Time Guidance

相关话题