Microsoft Research has introduced GroundedPlanBench, a new benchmark designed to evaluate the ability of vision-language models (VLMs) to perform long-horizon task planning for robot manipulation. Current VLM-based robot planners often struggle with complex tasks due to ambiguities in natural language instructions and the separation of action and location planning. The new benchmark, along with a framework called Video-to-Spatially Grounded Planning (V2GP), aims to improve robot planning by enabling VLMs to jointly determine both what actions to take and where they should occur, outperforming decoupled approaches in evaluations. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
RANK_REASON The cluster describes a new benchmark and framework for robot planning published in a research paper.