AI模型改进程序化规划和视频生成

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-18 13:42

研究人员开发了新的方法，通过将程序化规划和视频生成与指导性内容和物理原理相结合，来改进这些能力。一种名为RECIPE的方法，使用带有接地质量奖励的强化学习，在大型、嘈杂的指导视频语料库上训练模型，从而增强其生成分步计划的能力。另一个系统NEWTON将视频生成视为一项代理任务，协调各种物理感知工具，并使用验证器进行迭代重新规划，以提高生成视频中的物理常识。 AI

影响这些方法可能带来更强大的AI助手，能够理解和生成复杂的程序性任务和物理上逼真的视频。

排序理由两篇研究论文介绍了AI驱动的程序化规划和视频生成的新颖方法。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CV TIER_1 English(EN) · Lorenzo Torresani · 2026-05-19 15:20

RECIPE: Procedural Planning via Grounding in Instructional Video

Visual planning asks a model to generate the remaining steps of a procedure in natural language given a partial video context and a goal. Progress on this task is bottlenecked by annotation: clean labeled datasets are small, domain-narrow, and encode a single execution trajectory…
arXiv cs.CV TIER_1 English(EN) · Shujun Wang · 2026-05-18 13:42

NEWTON: Agentic Planning for Physically Grounded Video Generation

Video generation models produce visually compelling results but systematically violate physical commonsense -- on VideoPhy-2, the best model achieves only 32.6% joint accuracy. We identify a specification bottleneck: text prompts are lossy compression of the physical world, omitt…

报道来源 [2]

RECIPE: Procedural Planning via Grounding in Instructional Video

NEWTON: Agentic Planning for Physically Grounded Video Generation

相关实体

相关话题