PulseAugur
实时 12:03:20
English(EN) Geometric Action Model for Robot Policy Learning

新的机器人策略模型利用基础模型提高效率和准确性

研究人员推出了两个新的机器人策略学习模型,它们利用基础模型来提高性能。LaWAM(潜在世界动作模型)使用紧凑的潜在视觉子目标来预测未来场景状态,在各种基准测试中取得了最先进的成功率,并且延迟远低于像素空间模型。几何动作模型(GAM)重新利用了一个几何基础模型用于感知、预测和动作解码,直接整合了3D几何用于操作任务,并在准确性、鲁棒性、速度和效率方面优于现有基线。 AI

影响 这些模型通过整合基础模型以实现更高效、更准确的控制,从而推动了机器人学习的发展,并可能加速现实世界中的机器人应用。

排序理由 两篇在arXiv上发表的研究论文,介绍了用于机器人策略学习的新模型。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。 我们如何撰写摘要 →

报道来源 [4]

  1. arXiv cs.AI TIER_1 English(EN) · Jialei Chen, Kai Wang, Kang Chen, Shuaihang Chen, Feng Gao, Wenhao Tang, Zhiyuan Li, Weilin Liu, Zhuyu Yao, Boxun Li, Yuanbo Xu, Chao Yu ·

    LaWAM: Latent World Action Models for Efficient Dynamics-Aware Robot Policies

    arXiv:2606.15768v1 Announce Type: cross Abstract: Vision-Language-Action models (VLAs) leverage large-scale vision-language pretraining for semantic robot control, but often lack explicit foresight into how robot actions change the scene. World-Action Models (WAMs) address this l…

  2. arXiv cs.LG TIER_1 English(EN) · Jisang Han, Seonghu Jeon, Jaewoo Jung, Ren\'e Zurbr\"ugg, Honggyu An, Tifanny Portela, Marco Hutter, Marc Pollefeys, Seungryong Kim, Sunghwan Hong ·

    Geometric Action Model for Robot Policy Learning

    arXiv:2606.17046v1 Announce Type: cross Abstract: Generalist robot policies must follow user instructions while reasoning about how objects, cameras, and robot actions interact in the 3D physical world. Recent vision-language-action models (VLAs) and video world-action models (WA…

  3. Hugging Face Daily Papers TIER_1 English(EN) ·

    Geometric Action Model for Robot Policy Learning

    A geometric action model leverages pretrained geometric foundation models to enable language-conditioned manipulation policies with improved accuracy, robustness, and efficiency in 3D physical environments.

  4. arXiv cs.CV TIER_1 English(EN) · Sunghwan Hong ·

    Geometric Action Model for Robot Policy Learning

    Generalist robot policies must follow user instructions while reasoning about how objects, cameras, and robot actions interact in the 3D physical world. Recent vision-language-action models (VLAs) and video world-action models (WAMs) inherit strong semantic or temporal priors fro…