PulseAugur
实时 12:56:20

AI models improve procedural planning and video generation

Researchers have developed new methods for improving procedural planning and video generation by grounding them in instructional content and physical principles. One approach, RECIPE, uses reinforcement learning with a grounding quality reward to train models on large, noisy instructional video corpora, enhancing their ability to generate step-by-step plans. Another system, NEWTON, frames video generation as an agentic task, orchestrating various physics-aware tools and using a verifier for iterative re-planning to improve physical commonsense in generated videos. AI

影响 These methods could lead to more capable AI assistants that can understand and generate complex procedural tasks and physically realistic videos.

排序理由 Two research papers introducing novel methods for AI-driven procedural planning and video generation.

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

AI models improve procedural planning and video generation

报道来源 [2]

  1. arXiv cs.CV TIER_1 English(EN) · Lorenzo Torresani ·

    RECIPE: Procedural Planning via Grounding in Instructional Video

    Visual planning asks a model to generate the remaining steps of a procedure in natural language given a partial video context and a goal. Progress on this task is bottlenecked by annotation: clean labeled datasets are small, domain-narrow, and encode a single execution trajectory…

  2. arXiv cs.CV TIER_1 English(EN) · Shujun Wang ·

    NEWTON: Agentic Planning for Physically Grounded Video Generation

    Video generation models produce visually compelling results but systematically violate physical commonsense -- on VideoPhy-2, the best model achieves only 32.6% joint accuracy. We identify a specification bottleneck: text prompts are lossy compression of the physical world, omitt…