English(EN) Motion-Focused Latent Action Enables Cross-Embodiment VLA Training from Human EgoVideos

新框架使用未标记的人类视频训练AI动作模型

作者 PulseAugur 编辑部 · [1 个来源] · 2026-07-03 04:00

研究人员开发了一个新框架，使用未标记的人类视频训练视觉-语言-动作（VLA）模型。该系统名为运动聚焦潜在动作（Motion-Focused Latent Action），采用混合解耦VQ-VAE将运动动力学与背景元素分离，创建了一个通用动作先验的码本。这种预训练方法使VLA模型能够从现成的人类视频中学习动作意图，大大减少了下游适应所需的广泛标注机器人数据集。 AI

影响通过利用丰富的未标记人类视频数据，实现了机器人和具身AI模型更有效的训练。

排序理由这是一篇详细介绍AI模型新训练方法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Runze Xu, Yiluo Zhang, Jian Wang, Yu Wang, Jincheng Yu · 2026-07-03 04:00

Motion-Focused Latent Action Enables Cross-Embodiment VLA Training from Human EgoVideos

arXiv:2606.18955v2 Announce Type: replace Abstract: Training generalist Vision-Language-Action(VLA) models typically requires massive, diverse robotic datasets with high-fidelity action annotations. While egocentric human manipulation videos are abundant and capture significant e…

报道来源 [1]

Motion-Focused Latent Action Enables Cross-Embodiment VLA Training from Human EgoVideos

相关实体

相关话题