CineDance: Towards Next-Generation Multi-Shot Long-Form Cinematic Audio-Video Generation
Researchers have introduced CineDance-1M, a large-scale dataset for open-source text-to-audio-video generation, aiming to improve cinematic narrative capabilities. The dataset features long-form videos with an average of 92.8 seconds and 24.2 shots, supported by structured audio-video annotations derived from a three-stage curation process. To evaluate performance, they also propose CineBench, a new metric system for complex audio-video narratives, and demonstrate an adapted LTX-2.3 model that shows strong alignment and consistency. AI
IMPACT Provides a foundational dataset and evaluation tools to accelerate open-source research in long-form cinematic audio-video generation.