Microsoft Research develops benchmark for robots to plan and execute tasks with spatial grounding

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Microsoft Research has introduced GroundedPlanBench, a new benchmark designed to evaluate the ability of vision-language models (VLMs) to perform long-horizon task planning for robot manipulation. Current VLM-based robot planners often struggle with complex tasks due to ambiguities in natural language instructions and the separation of action and location planning. The new benchmark, along with a framework called Video-to-Spatially Grounded Planning (V2GP), aims to improve robot planning by enabling VLMs to jointly determine both what actions to take and where they should occur, outperforming decoupled approaches in evaluations. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

RANK_REASON The cluster describes a new benchmark and framework for robot planning published in a research paper.

Read on Microsoft Research →

Microsoft Research develops benchmark for robots to plan and execute tasks with spatial grounding

COVERAGE [1]

Microsoft Research TIER_1 · Sehun Jung, HyunJee Song, Dong-Hee Kim, Reuben Tan, Jianfeng Gao, Yong Jae Lee, Donghyun Kim · 2026-03-26 16:03

GroundedPlanBench: Spatially grounded long-horizon task planning for robot manipulation

<p>Vision-language models (VLMs) use images and text to plan robot actions, but they still struggle to decide what actions to take and where to take them. Most systems split these decisions into two steps: a VLM generates a plan in natural language, and a separate model translate…

COVERAGE [1]

GroundedPlanBench: Spatially grounded long-horizon task planning for robot manipulation

RELATED TOPICS