English(EN) DiLA: Disentangled Latent Action World Models

DiLA模型通过解耦学习推进了自监督世界模型

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-15 08:22

研究人员开发了DiLA，一种新颖的解耦潜在动作世界模型，旨在改进视频生成和动作抽象。DiLA通过将视觉细节分离到内容通路，将空间布局分离到结构通路，解决了动作抽象和生成保真度之间的权衡问题。这种解耦允许在不牺牲生成质量的情况下，实现连续的、语义结构化的潜在动作空间，从而在视频生成、动作迁移和视觉规划方面取得了卓越的性能。 AI

影响引入了一个用于自监督世界模型学习的新框架，有望提高视频生成和规划能力。

排序理由详细介绍新模型架构及其性能的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Si Wu · 2026-05-15 08:22

DiLA: Disentangled Latent Action World Models

Latent Action Models (LAMs) enable the learning of world models from unlabeled video by inferring abstract actions between consecutive frames. However, LAMs face a fundamental trade-off between action abstraction and generation fidelity. Existing methods typically circumvent this…

报道来源 [1]

DiLA: Disentangled Latent Action World Models

相关实体

相关话题