PulseAugur
EN
LIVE 22:22:23

New models enhance controllable video generation with discrete actions and multi-agent consistency

Researchers have developed new methods for generating controllable video world models. DisCo focuses on using discrete action primitives to improve control over camera motion, addressing issues with continuous trajectories. Prisma-World tackles the challenge of multi-agent video generation by ensuring cross-view consistency through a joint geometry-aware denoising process and introduces a new dataset for training and evaluation. AI

IMPACT These advancements in controllable video generation could enable more realistic and interactive virtual environments for training and simulation.

RANK_REASON The cluster contains two research papers introducing new models and datasets for video generation.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 4 sources. How we write summaries →

COVERAGE [4]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    WorldCraft: From Camera Navigation to Object Manipulation in Interactive Video World Models

    WorldCraft extends interactive video world models to enable object-level trajectory control while maintaining camera navigation capabilities through specialized control pipelines.

  2. arXiv cs.CV TIER_1 English(EN) · Hongrui Huang, Junke Wang, Quanhao Li, Yu-Gang Jiang, Zuxuan Wu ·

    DisCo: World Models with Discrete Camera Motion Control

    arXiv:2606.07967v1 Announce Type: new Abstract: Controllable video world models target interactive world exploration, where models must faithfully execute explicit action commands while preserving visual quality and temporal coherence. However, most existing approaches rely on co…

  3. arXiv cs.CV TIER_1 English(EN) · Huiqiang Sun, Zhan Peng, Size Wu, Kun Wang, Kang Liao, Dianyi Wang, Xingyu Zeng, Sheng Jin, Yangguang Li, Zhiguo Cao, Ziwei Liu, Wei Li ·

    Prisma-World: Camera-Controllable Multi-Agent Video World Model

    arXiv:2606.09507v1 Announce Type: new Abstract: Video world models have made rapid progress in generating controllable visual experiences, but most of them still simulate the world from a single observer. Extending such models to multiple agents raises a central challenge: if eac…

  4. arXiv cs.CV TIER_1 English(EN) · Wei Li ·

    Prisma-World: Camera-Controllable Multi-Agent Video World Model

    Video world models have made rapid progress in generating controllable visual experiences, but most of them still simulate the world from a single observer. Extending such models to multiple agents raises a central challenge: if each agent's future state is generated independentl…