Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 2d · [2 sources]

GEAR-VLA: Learning Geometry-Aware Action Representations for Generalizable Robotic Manipulation

Researchers have developed GEAR-VLA, a new framework designed to improve the generalizability of Vision-Language-Action (VLA) models in robotic manipulation tasks. This approach addresses limitations in current VLA models by learning unified, geometry-aware action representations. GEAR-VLA utilizes a coarse-to-fine learning strategy, integrating embodied pretraining with a continuous action expert and aligning a 3D spatial backbone with the VLA representation. The framework also incorporates embodiment canonicalization to enable cross-robot generalization, demonstrating state-of-the-art performance on several benchmarks and achieving high success rates in tasks involving unseen objects and different robotic embodiments. AI

IMPACT Enhances generalization for robotic manipulation tasks by improving VLA models' ability to handle unseen objects and different embodiments.