GAE: Unleashing Physical Potential of VLM with Generalizable Action Expert
Researchers have developed a new model called Generalizable Action Expert (GAE) to improve how vision-language models (VLMs) translate high-level plans into precise robot actions. GAE acts as a task-agnostic component that converts sparse geometric plans, predicted by a VLM, into continuous action trajectories. This approach decouples reasoning from action generation, enhancing generalization. GAE is pre-trained on a large dataset of robot trajectories and utilizes an Action Pre-training, Pointcloud Fine-tuning (APPF) scheme for efficiency. AI
IMPACT This research could lead to more capable robots that can better understand and execute complex instructions.