English(EN)MotiMotion: Motion-Controlled Video Generation with Visual Reasoning
新框架通过高级推理能力增强视频生成
作者PulseAugur 编辑部·[7 个来源]·
研究人员开发了新的框架,通过整合高级推理能力来增强视频生成。MotiMotion 利用视觉语言模型预测合理的次级运动并根据置信度调整引导,从而优化运动控制。VChain 集成了来自多模态模型的视觉推理能力,以生成指导视频生成器的关键帧,从而改进复杂、多步骤场景的合成。CogOmniControl 专注于理解用户从抽象条件中提出的创意意图,并使用在专业数据上训练的专用模型来生成符合这些意图的视频。
AI
MotiMotion introduces a reasoning-then-generation framework for motion-controlled video generation that improves plausibility through vision-language reasoning and confidence-aware control mechanisms.
arXiv cs.CV
TIER_1English(EN)·Adil Meric, Lin Geng Foo, Mert Kiray, Benjamin Busam, Rishabh Dabral, Christian Theobalt·
arXiv:2605.22996v1 Announce Type: new Abstract: We present CoMoGen, a controllable video generation framework that generates realistic interactive dynamics from a single binary mask sequence conditioned on an input image. CoMoGen introduces a lightweight MaskAdapter that encodes …
arXiv cs.CV
TIER_1English(EN)·Jiarong Liang, Max Ku, Ka-Hei Hui, Ping Nie, Wenhu Chen·
arXiv:2602.13294v3 Announce Type: replace Abstract: Evaluating whether Multimodal Large Language Models (MLLMs) genuinely reason about physical dynamics remains challenging. Most existing benchmarks rely on recognition-style protocols such as Visual Question Answering (VQA) and V…
arXiv cs.CV
TIER_1English(EN)·Ziqi Huang, Ning Yu, Gordon Chen, Haonan Qiu, Paul Debevec, Ziwei Liu·
arXiv:2510.05094v2 Announce Type: replace Abstract: Recent video generation models can produce smooth and visually appealing clips, but they often struggle to synthesize complex dynamics with a coherent chain of consequences. Accurately modeling visual outcomes and state transiti…
arXiv:2605.22818v1 Announce Type: new Abstract: Current motion-controlled image-to-video generation models rigidly follow user-provided trajectories that are often sparse, imprecise, and causally incomplete. Such reliance often yields unnatural or implausible outcomes, especially…
Current motion-controlled image-to-video generation models rigidly follow user-provided trajectories that are often sparse, imprecise, and causally incomplete. Such reliance often yields unnatural or implausible outcomes, especially by missing secondary causal consequences. To ad…
Recent diffusion models achieve strong photorealism and fluency in video generation, yet remain fragile under abstract, sparse or complex conditions, leading to poor performance in professional production workflows such as storyboard sketches and clay render conditions. Existing …