New benchmarks and frameworks advance robot manipulation reasoning

By PulseAugur Editorial · [4 sources] · 2026-06-25 16:50

Researchers have introduced two new frameworks for advancing robot manipulation capabilities. WatchAct is a benchmark designed to evaluate a robot's ability to reason about observed human behavior, using video and language instructions to assess event parsing, procedural reasoning, and intent inference. In contrast, E-TTS is a test-time scaling framework that unifies reasoning and action scaling for robotic manipulation by incorporating historical context and iterative refinement with vision-language verifiers. Both approaches aim to improve robot performance in complex, long-horizon tasks, with E-TTS demonstrating significant gains in simulation and real-world scenarios without retraining. AI

IMPACT These advancements could lead to more capable robots that can better understand and interact with human behavior and environments.

RANK_REASON Two new research papers introducing benchmarks and frameworks for robotic manipulation.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 4 sources. How we write summaries →

New benchmarks and frameworks advance robot manipulation reasoning

COVERAGE [4]

arXiv cs.AI TIER_1 English(EN) · Baiqi Li, Ce Zhang, Yu Fang, Yue Yang, Shangzhe Li, Mingyu Ding, Gedas Bertasius · 2026-06-26 04:00

WatchAct: A Benchmark for Behavior-Grounded Robot Manipulation

arXiv:2606.26443v1 Announce Type: cross Abstract: A robot working alongside people must reason about what they have done, in what order, and with what intent. Video carries the spatial layouts, object histories, and gestures that language leaves underspecified, yet today's manipu…
arXiv cs.AI TIER_1 English(EN) · Wen Ye, Peiyan Li, Tingyu Yuan, Yuan Xu, Xiangnan Wu, Chaoyang Zhao, Jing Liu, Nianfeng Liu, Yan Huang, Liang Wang · 2026-06-26 04:00

E-TTS: A New Embodied Test-Time Scaling Framework for Robotic Manipulation

arXiv:2606.27268v1 Announce Type: cross Abstract: Recently, a few works have made early attempts to study test-time scaling for embodied tasks. However, two major challenges remain unsolved: (1) reasoning can effectively improve the performance of the policy, but its scaling mech…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-25 16:50

E-TTS: A New Embodied Test-Time Scaling Framework for Robotic Manipulation

Recently, a few works have made early attempts to study test-time scaling for embodied tasks. However, two major challenges remain unsolved: (1) reasoning can effectively improve the performance of the policy, but its scaling mechanism has seldom been studied; (2) historical info…
arXiv cs.AI TIER_1 English(EN) · Liang Wang · 2026-06-25 16:50

E-TTS: A New Embodied Test-Time Scaling Framework for Robotic Manipulation

Recently, a few works have made early attempts to study test-time scaling for embodied tasks. However, two major challenges remain unsolved: (1) reasoning can effectively improve the performance of the policy, but its scaling mechanism has seldom been studied; (2) historical info…

COVERAGE [4]

WatchAct: A Benchmark for Behavior-Grounded Robot Manipulation

E-TTS: A New Embodied Test-Time Scaling Framework for Robotic Manipulation

E-TTS: A New Embodied Test-Time Scaling Framework for Robotic Manipulation

E-TTS: A New Embodied Test-Time Scaling Framework for Robotic Manipulation

RELATED ENTITIES

RELATED TOPICS