English(EN) GEAR-VLA: Learning Geometry-Aware Action Representations for Generalizable Robotic Manipulation

GEAR-VLA框架增强机器人操作泛化能力

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-07 09:23

研究人员开发了GEAR-VLA，一个旨在提高视觉-语言-动作（VLA）模型在机器人操作任务中泛化能力的新框架。该方法通过学习统一的、几何感知的动作表征来解决当前VLA模型的局限性。GEAR-VLA采用粗粒到细粒的学习策略，整合了具身预训练与连续动作专家，并将3D空间骨干网络与VLA表征对齐。该框架还纳入了具身规范化，以实现跨机器人泛化，在多个基准测试中展示了最先进的性能，并在涉及未知物体和不同机器人具身特性的任务中取得了高成功率。 AI

影响通过提高VLA模型处理未知物体和不同具身特性的能力，增强了机器人操作任务的泛化能力。

排序理由该集群包含一篇详细介绍机器人操作新框架的学术论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Yuan Zhang, Shiqi Zhang, Yedong Shen, Shuai Dong, Jiajun Deng, Xin Zhang, Yuxuan Gao, Jiajia Wu, Xin Nie, Zhiyuan Cheng, Jianmin Ji, Yanyong Zhang, Xingyi Zhang, Jia Pan · 2026-06-09 04:00

GEAR-VLA：为通用机器人操作学习感知几何的动作表示

arXiv:2606.08530v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models achieve strong benchmark performance but still struggle in real-world deployment with unseen objects, background shifts, and different robot embodiments. We argue that this stems from the lack o…
arXiv cs.AI TIER_1 English(EN) · Jia Pan · 2026-06-07 09:23

GEAR-VLA：为通用机器人操作学习感知几何的动作表示

Vision-Language-Action (VLA) models achieve strong benchmark performance but still struggle in real-world deployment with unseen objects, background shifts, and different robot embodiments. We argue that this stems from the lack of a unified geometry-aware manipulation representa…

报道来源 [2]

GEAR-VLA：为通用机器人操作学习感知几何的动作表示

GEAR-VLA：为通用机器人操作学习感知几何的动作表示

相关实体

相关话题