PulseAugur
实时 12:52:11

新的基准和框架推动机器人操作推理能力发展

研究人员推出了两个新框架,以推进机器人操作能力。WatchAct 是一个旨在评估机器人推理观察到的人类行为能力的基准,它使用视频和语言指令来评估事件解析、程序推理和意图推断。相比之下,E-TTS 是一个测试时缩放框架,通过结合历史上下文和使用视觉-语言验证器的迭代优化,统一了机器人操作的推理和动作缩放。这两种方法都旨在提高机器人在复杂、长时任务中的性能,其中 E-TTS 在模拟和真实世界场景中均取得了显著的提升,且无需重新训练。 AI

影响 这些进展可能带来更强大的机器人,使其能够更好地理解和与人类行为及环境互动。

排序理由 两篇新研究论文介绍了机器人操作的基准和框架。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。 我们如何撰写摘要 →

新的基准和框架推动机器人操作推理能力发展

报道来源 [4]

  1. arXiv cs.AI TIER_1 English(EN) · Baiqi Li, Ce Zhang, Yu Fang, Yue Yang, Shangzhe Li, Mingyu Ding, Gedas Bertasius ·

    WatchAct: A Benchmark for Behavior-Grounded Robot Manipulation

    arXiv:2606.26443v1 Announce Type: cross Abstract: A robot working alongside people must reason about what they have done, in what order, and with what intent. Video carries the spatial layouts, object histories, and gestures that language leaves underspecified, yet today's manipu…

  2. arXiv cs.AI TIER_1 English(EN) · Wen Ye, Peiyan Li, Tingyu Yuan, Yuan Xu, Xiangnan Wu, Chaoyang Zhao, Jing Liu, Nianfeng Liu, Yan Huang, Liang Wang ·

    E-TTS: A New Embodied Test-Time Scaling Framework for Robotic Manipulation

    arXiv:2606.27268v1 Announce Type: cross Abstract: Recently, a few works have made early attempts to study test-time scaling for embodied tasks. However, two major challenges remain unsolved: (1) reasoning can effectively improve the performance of the policy, but its scaling mech…

  3. Hugging Face Daily Papers TIER_1 English(EN) ·

    E-TTS: A New Embodied Test-Time Scaling Framework for Robotic Manipulation

    Recently, a few works have made early attempts to study test-time scaling for embodied tasks. However, two major challenges remain unsolved: (1) reasoning can effectively improve the performance of the policy, but its scaling mechanism has seldom been studied; (2) historical info…

  4. arXiv cs.AI TIER_1 English(EN) · Liang Wang ·

    E-TTS: A New Embodied Test-Time Scaling Framework for Robotic Manipulation

    Recently, a few works have made early attempts to study test-time scaling for embodied tasks. However, two major challenges remain unsolved: (1) reasoning can effectively improve the performance of the policy, but its scaling mechanism has seldom been studied; (2) historical info…