Geometric Action Model for Robot Policy Learning
Researchers have introduced two new models for robot policy learning that leverage foundation models for improved performance. LaWAM (Latent World Action Model) uses compact latent visual subgoals to predict future scene states, achieving state-of-the-art success rates on various benchmarks with significantly lower latency than pixel-space models. The Geometric Action Model (GAM) repurposes a geometric foundation model for perception, prediction, and action decoding, directly incorporating 3D geometry for manipulation tasks and outperforming existing baselines in accuracy, robustness, speed, and efficiency. AI
IMPACT These models advance robot learning by integrating foundation models for more efficient and accurate control, potentially accelerating real-world robotic applications.