New SleepWalk benchmark tests AI's 3D navigation and instruction grounding

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have introduced SleepWalk, a new benchmark designed to rigorously test instruction-guided vision-language navigation capabilities of AI models. This benchmark focuses on localized, interaction-centric embodied reasoning within 3D environments, evaluating a model's ability to predict a trajectory that aligns with natural language instructions while respecting scene geometry and avoiding collisions. SleepWalk categorizes tasks into three difficulty tiers to allow for detailed analysis of how models handle increasing spatial and temporal complexity, revealing significant failures in grounded spatial reasoning, particularly with multi-step instructions and occlusion. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT This benchmark will help advance grounded multimodal reasoning and the development of action-capable agents in 3D environments.

RANK_REASON The cluster describes a new academic benchmark paper for evaluating AI models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

COVERAGE [1]

arXiv cs.CV TIER_1 · Amitava Das · 2026-05-11 11:20

SleepWalk: A Three-Tier Benchmark for Stress-Testing Instruction-Guided Vision-Language Navigation

Vision-Language Models (VLMs) have advanced rapidly in multimodal perception and language understanding, yet it remains unclear whether they can reliably ground language into spatially coherent, plausibly executable actions in 3D digital environments. We introduce SleepWalk, a be…

COVERAGE [1]

SleepWalk: A Three-Tier Benchmark for Stress-Testing Instruction-Guided Vision-Language Navigation

RELATED ENTITIES

RELATED TOPICS