English(EN) WorldCraft: From Camera Navigation to Object Manipulation in Interactive Video World Models

新模型通过离散动作和多智能体一致性增强可控视频生成

作者 PulseAugur 编辑部 · [4 个来源] · 2026-05-24 00:00

研究人员开发了生成可控视频世界模型的新方法。DisCo 专注于使用离散动作原语来改进相机运动的控制，解决了连续轨迹的问题。Prisma-World 通过联合几何感知去噪过程确保跨视图一致性，解决了多智能体视频生成的挑战，并引入了一个用于训练和评估的新数据集。 AI

影响可控视频生成方面的这些进步可以为训练和模拟提供更现实和更具交互性的虚拟环境。

排序理由该集群包含两篇介绍视频生成新模型和数据集的研究论文。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。我们如何撰写摘要 →

报道来源 [4]

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-24 00:00

WorldCraft: From Camera Navigation to Object Manipulation in Interactive Video World Models

WorldCraft extends interactive video world models to enable object-level trajectory control while maintaining camera navigation capabilities through specialized control pipelines.
arXiv cs.CV TIER_1 English(EN) · Hongrui Huang, Junke Wang, Quanhao Li, Yu-Gang Jiang, Zuxuan Wu · 2026-06-09 04:00

DisCo：具有离散相机运动控制的世界模型

arXiv:2606.07967v1 Announce Type: new Abstract: Controllable video world models target interactive world exploration, where models must faithfully execute explicit action commands while preserving visual quality and temporal coherence. However, most existing approaches rely on co…
arXiv cs.CV TIER_1 English(EN) · Huiqiang Sun, Zhan Peng, Size Wu, Kun Wang, Kang Liao, Dianyi Wang, Xingyu Zeng, Sheng Jin, Yangguang Li, Zhiguo Cao, Ziwei Liu, Wei Li · 2026-06-09 04:00

Prisma-World: 可控相机的多智能体视频世界模型

arXiv:2606.09507v1 Announce Type: new Abstract: Video world models have made rapid progress in generating controllable visual experiences, but most of them still simulate the world from a single observer. Extending such models to multiple agents raises a central challenge: if eac…
arXiv cs.CV TIER_1 English(EN) · Wei Li · 2026-06-08 13:59

Prisma-World: 可控相机的多智能体视频世界模型

Video world models have made rapid progress in generating controllable visual experiences, but most of them still simulate the world from a single observer. Extending such models to multiple agents raises a central challenge: if each agent's future state is generated independentl…

报道来源 [4]

WorldCraft: From Camera Navigation to Object Manipulation in Interactive Video World Models

DisCo：具有离散相机运动控制的世界模型

Prisma-World: 可控相机的多智能体视频世界模型

Prisma-World: 可控相机的多智能体视频世界模型

相关实体

相关话题