Auteur: Language-Driven Cinematographic Framing for Human-Centric Video Generation
Researchers have developed Auteur, a novel method for generating human-centric video with language-driven cinematographic framing. Unlike previous approaches that treat camera motion as a byproduct, Auteur parameterizes camera control relative to the actor's pose and motion. A fine-tuned multimodal large language model translates natural language descriptions and human motion into keyframes, which are then interpolated into continuous camera trajectories for video generators. This system enables more intentional and professional-looking camera work in generative video, outperforming existing methods on new framing-focused metrics. AI
IMPACT Enables more sophisticated and director-controlled camera work in generative video.