PulseAugur
实时 14:57:01
English(EN) MotiMotion: Motion-Controlled Video Generation with Visual Reasoning

新框架通过高级推理能力增强视频生成

研究人员开发了新的框架,通过整合高级推理能力来增强视频生成。MotiMotion 利用视觉语言模型预测合理的次级运动并根据置信度调整引导,从而优化运动控制。VChain 集成了来自多模态模型的视觉推理能力,以生成指导视频生成器的关键帧,从而改进复杂、多步骤场景的合成。CogOmniControl 专注于理解用户从抽象条件中提出的创意意图,并使用在专业数据上训练的专用模型来生成符合这些意图的视频。 AI

影响 这些在推理驱动视频生成方面的进步可能为创意和专业应用带来更真实、更可控的视频合成。

排序理由 多篇研究论文介绍了用于增强推理能力的视频生成新框架和基准。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 7 个来源。 我们如何撰写摘要 →

新框架通过高级推理能力增强视频生成

报道来源 [7]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    MotiMotion:具有视觉推理能力的运动控制视频生成

    MotiMotion introduces a reasoning-then-generation framework for motion-controlled video generation that improves plausibility through vision-language reasoning and confidence-aware control mechanisms.

  2. arXiv cs.CV TIER_1 English(EN) · Adil Meric, Lin Geng Foo, Mert Kiray, Benjamin Busam, Rishabh Dabral, Christian Theobalt ·

    CoMoGen: COntrollable MOtion Dynamics and Interactions with Mask-Guided Video GENeration

    arXiv:2605.22996v1 Announce Type: new Abstract: We present CoMoGen, a controllable video generation framework that generates realistic interactive dynamics from a single binary mask sequence conditioned on an input image. CoMoGen introduces a lightweight MaskAdapter that encodes …

  3. arXiv cs.CV TIER_1 English(EN) · Jiarong Liang, Max Ku, Ka-Hei Hui, Ping Nie, Wenhu Chen ·

    VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction

    arXiv:2602.13294v3 Announce Type: replace Abstract: Evaluating whether Multimodal Large Language Models (MLLMs) genuinely reason about physical dynamics remains challenging. Most existing benchmarks rely on recognition-style protocols such as Visual Question Answering (VQA) and V…

  4. arXiv cs.CV TIER_1 English(EN) · Ziqi Huang, Ning Yu, Gordon Chen, Haonan Qiu, Paul Debevec, Ziwei Liu ·

    VChain: Chain-of-Visual-Thought for Reasoning in Video Generation

    arXiv:2510.05094v2 Announce Type: replace Abstract: Recent video generation models can produce smooth and visually appealing clips, but they often struggle to synthesize complex dynamics with a coherent chain of consequences. Accurately modeling visual outcomes and state transiti…

  5. arXiv cs.CV TIER_1 English(EN) · Lee Hsin-Ying, Hanwen Jiang, Yiqun Mei, Jing Shi, Ming-Hsuan Yang, Zhixin Shu ·

    MotiMotion: Motion-Controlled Video Generation with Visual Reasoning

    arXiv:2605.22818v1 Announce Type: new Abstract: Current motion-controlled image-to-video generation models rigidly follow user-provided trajectories that are often sparse, imprecise, and causally incomplete. Such reliance often yields unnatural or implausible outcomes, especially…

  6. arXiv cs.CV TIER_1 English(EN) · Zhixin Shu ·

    MotiMotion: Motion-Controlled Video Generation with Visual Reasoning

    Current motion-controlled image-to-video generation models rigidly follow user-provided trajectories that are often sparse, imprecise, and causally incomplete. Such reliance often yields unnatural or implausible outcomes, especially by missing secondary causal consequences. To ad…

  7. arXiv cs.CV TIER_1 English(EN) · Jianbing Shen ·

    CogOmniControl: Reasoning-Driven Controllable Video Generation via Creative Intent Cognition

    Recent diffusion models achieve strong photorealism and fluency in video generation, yet remain fragile under abstract, sparse or complex conditions, leading to poor performance in professional production workflows such as storyboard sketches and clay render conditions. Existing …