DiLA model advances self-supervised world models with disentangled learning

By PulseAugur Editorial · [1 sources] · 2026-05-15 08:22

Researchers have developed DiLA, a novel Disentangled Latent Action world model designed to improve video generation and action abstraction. DiLA addresses the trade-off between action abstraction and generation fidelity by separating visual details into a content pathway and spatial layouts into a structure pathway. This disentanglement allows for a continuous, semantically structured latent action space without sacrificing generative quality, leading to superior performance in video generation, action transfer, and visual planning. AI

IMPACT Introduces a new framework for self-supervised world model learning, potentially improving video generation and planning capabilities.

RANK_REASON Academic paper detailing a new model architecture and its performance. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

DiLA model advances self-supervised world models with disentangled learning

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Si Wu · 2026-05-15 08:22

DiLA: Disentangled Latent Action World Models

Latent Action Models (LAMs) enable the learning of world models from unlabeled video by inferring abstract actions between consecutive frames. However, LAMs face a fundamental trade-off between action abstraction and generation fidelity. Existing methods typically circumvent this…

COVERAGE [1]

DiLA: Disentangled Latent Action World Models

RELATED ENTITIES

RELATED TOPICS