Brief · PulseAugur

RESEARCH · arXiv cs.LG English(EN) · 1d · [4 sources]

Geometric Action Model for Robot Policy Learning

Researchers have introduced two new models for robot policy learning that leverage foundation models for improved performance. LaWAM (Latent World Action Model) uses compact latent visual subgoals to predict future scene states, achieving state-of-the-art success rates on various benchmarks with significantly lower latency than pixel-space models. The Geometric Action Model (GAM) repurposes a geometric foundation model for perception, prediction, and action decoding, directly incorporating 3D geometry for manipulation tasks and outperforming existing baselines in accuracy, robustness, speed, and efficiency. AI

IMPACT These models advance robot learning by integrating foundation models for more efficient and accurate control, potentially accelerating real-world robotic applications.

arXiv
Vlas
Vision-Language Action Models
LIBERO
World-Action Models
RoboTwin
Latent World Models For Intrinsically Motivated Exploration
Geometric Action Model
Geometric Foundation Model
video world-action models
Latent World Action Model