English(EN)Geometric Action Model for Robot Policy Learning
新的机器人策略模型利用基础模型提高效率和准确性
作者PulseAugur 编辑部·[4 个来源]·
研究人员推出了两个新的机器人策略学习模型,它们利用基础模型来提高性能。LaWAM(潜在世界动作模型)使用紧凑的潜在视觉子目标来预测未来场景状态,在各种基准测试中取得了最先进的成功率,并且延迟远低于像素空间模型。几何动作模型(GAM)重新利用了一个几何基础模型用于感知、预测和动作解码,直接整合了3D几何用于操作任务,并在准确性、鲁棒性、速度和效率方面优于现有基线。
AI
arXiv:2606.15768v1 Announce Type: cross Abstract: Vision-Language-Action models (VLAs) leverage large-scale vision-language pretraining for semantic robot control, but often lack explicit foresight into how robot actions change the scene. World-Action Models (WAMs) address this l…
arXiv cs.LG
TIER_1English(EN)·Jisang Han, Seonghu Jeon, Jaewoo Jung, Ren\'e Zurbr\"ugg, Honggyu An, Tifanny Portela, Marco Hutter, Marc Pollefeys, Seungryong Kim, Sunghwan Hong·
arXiv:2606.17046v1 Announce Type: cross Abstract: Generalist robot policies must follow user instructions while reasoning about how objects, cameras, and robot actions interact in the 3D physical world. Recent vision-language-action models (VLAs) and video world-action models (WA…
A geometric action model leverages pretrained geometric foundation models to enable language-conditioned manipulation policies with improved accuracy, robustness, and efficiency in 3D physical environments.
Generalist robot policies must follow user instructions while reasoning about how objects, cameras, and robot actions interact in the 3D physical world. Recent vision-language-action models (VLAs) and video world-action models (WAMs) inherit strong semantic or temporal priors fro…