New frameworks enhance video generation with keyframes, motion control, and reasoning

By PulseAugur Editorial · [11 sources] · 2026-05-19 15:29

Researchers have introduced several new frameworks for advanced video generation, focusing on enhanced control and realism. SmartDirector utilizes multiple keyframes to guide cinematic video creation, improving narrative structure and temporal pacing. MotiMotion addresses limitations in motion-controlled video by integrating visual reasoning to refine trajectories and predict secondary effects, aiming for more natural and plausible outcomes. PostCam offers a streamlined approach to novel-view video generation with precise camera editing, while CamC2V integrates 3D constraints and multiple image conditions for context-aware generation. CoMoGen generates realistic interactive dynamics from mask sequences, and VChain employs a chain-of-visual-thought process using multimodal models to guide video generation at critical moments. AI

IMPACT These advancements in controllable and reasoning-driven video generation could lead to more sophisticated AI-powered content creation tools and simulations.

RANK_REASON Multiple research papers introducing new frameworks for video generation.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 11 sources. How we write summaries →

COVERAGE [11]

arXiv cs.AI TIER_1 English(EN) · Zhida Zhang, Jie Ma, Zhan Peng, Haoxue Wu, Yang Han, Jun Liang, Jie Cao, Jing Li · 2026-05-28 04:00

SmartDirector: Keyframe-Conditioned Cinematic Video Generation with Narrative Pacing Control

arXiv:2605.27891v1 Announce Type: cross Abstract: The narrative quality of a video fundamentally determines its perceptual value. Although existing video generation methods can produce visually appealing content, they predominantly rely on sparse conditioning signals such as text…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-27 00:00

SmartDirector: Keyframe-Conditioned Cinematic Video Generation with Narrative Pacing Control

SmartDirector enhances video generation by using multiple keyframes to improve narrative structure and temporal pacing through a two-stage process of low-resolution generation and high-resolution refinement.
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-21 00:00

MotiMotion: Motion-Controlled Video Generation with Visual Reasoning

MotiMotion introduces a reasoning-then-generation framework for motion-controlled video generation that improves plausibility through vision-language reasoning and confidence-aware control mechanisms.
arXiv cs.CV TIER_1 English(EN) · Yipeng Chen, Zhichao Ye, Zhenzhou Fang, Xinyu Chen, Xiaoyu Zhang, Jialing Liu, Nan Wang, Guofeng Zhang, Haomin Liu · 2026-06-01 04:00

PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention

arXiv:2511.17185v2 Announce Type: replace Abstract: We propose PostCam, a streamlined framework for novel-view video generation that achieves superior detail preservation and precise camera trajectory editing in dynamic scenes. Current methods often struggle with a trade-off betw…
arXiv cs.CV TIER_1 English(EN) · Luis Denninger, Sina Mokhtarzadeh Azar, Juergen Gall · 2026-05-29 04:00

CamC2V: Context-aware Controllable Video Generation

arXiv:2504.06022v3 Announce Type: replace Abstract: Recently, image-to-video (I2V) diffusion models have demonstrated impressive scene understanding and generative quality, incorporating image conditions to guide generation. However, these models primarily animate static images w…
arXiv cs.CV TIER_1 English(EN) · Adil Meric, Lin Geng Foo, Mert Kiray, Benjamin Busam, Rishabh Dabral, Christian Theobalt · 2026-05-25 04:00

CoMoGen: COntrollable MOtion Dynamics and Interactions with Mask-Guided Video GENeration

arXiv:2605.22996v1 Announce Type: new Abstract: We present CoMoGen, a controllable video generation framework that generates realistic interactive dynamics from a single binary mask sequence conditioned on an input image. CoMoGen introduces a lightweight MaskAdapter that encodes …
arXiv cs.CV TIER_1 English(EN) · Jiarong Liang, Max Ku, Ka-Hei Hui, Ping Nie, Wenhu Chen · 2026-05-22 04:00

VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction

arXiv:2602.13294v3 Announce Type: replace Abstract: Evaluating whether Multimodal Large Language Models (MLLMs) genuinely reason about physical dynamics remains challenging. Most existing benchmarks rely on recognition-style protocols such as Visual Question Answering (VQA) and V…
arXiv cs.CV TIER_1 English(EN) · Ziqi Huang, Ning Yu, Gordon Chen, Haonan Qiu, Paul Debevec, Ziwei Liu · 2026-05-22 04:00

VChain: Chain-of-Visual-Thought for Reasoning in Video Generation

arXiv:2510.05094v2 Announce Type: replace Abstract: Recent video generation models can produce smooth and visually appealing clips, but they often struggle to synthesize complex dynamics with a coherent chain of consequences. Accurately modeling visual outcomes and state transiti…
arXiv cs.CV TIER_1 English(EN) · Lee Hsin-Ying, Hanwen Jiang, Yiqun Mei, Jing Shi, Ming-Hsuan Yang, Zhixin Shu · 2026-05-22 04:00

MotiMotion: Motion-Controlled Video Generation with Visual Reasoning

arXiv:2605.22818v1 Announce Type: new Abstract: Current motion-controlled image-to-video generation models rigidly follow user-provided trajectories that are often sparse, imprecise, and causally incomplete. Such reliance often yields unnatural or implausible outcomes, especially…
arXiv cs.CV TIER_1 English(EN) · Zhixin Shu · 2026-05-21 17:59

MotiMotion: Motion-Controlled Video Generation with Visual Reasoning

Current motion-controlled image-to-video generation models rigidly follow user-provided trajectories that are often sparse, imprecise, and causally incomplete. Such reliance often yields unnatural or implausible outcomes, especially by missing secondary causal consequences. To ad…
arXiv cs.CV TIER_1 English(EN) · Jianbing Shen · 2026-05-19 15:29

CogOmniControl: Reasoning-Driven Controllable Video Generation via Creative Intent Cognition

Recent diffusion models achieve strong photorealism and fluency in video generation, yet remain fragile under abstract, sparse or complex conditions, leading to poor performance in professional production workflows such as storyboard sketches and clay render conditions. Existing …

COVERAGE [11]

RELATED ENTITIES

RELATED TOPICS