WAM4D: Fast 4D World Action Model via Spatial Register Tokens
Researchers have developed WAM4D, a novel 4D world action model designed to improve robot manipulation by incorporating 3D spatial constraints. Unlike previous models that operate in 2D or latent spaces, WAM4D leverages lightweight spatial register tokens to transfer geometric priors into a causal transformer. This approach allows for efficient action inference by removing the register branch after training, while causal mixture attention prevents non-causal shortcuts. Experiments on the RoboTwin 2.0 dataset and real-world tasks demonstrate WAM4D's ability to enhance spatial consistency and action prediction efficiency. AI
IMPACT WAM4D's efficient inference and improved spatial consistency could accelerate the development of more capable robotic systems for complex manipulation tasks.