PulseAugur
实时 22:13:33

New SleepWalk benchmark tests AI's 3D navigation and instruction grounding

Researchers have introduced SleepWalk, a new benchmark designed to rigorously test instruction-guided vision-language navigation capabilities of AI models. This benchmark focuses on localized, interaction-centric embodied reasoning within 3D environments, evaluating a model's ability to predict a trajectory that aligns with natural language instructions while respecting scene geometry and avoiding collisions. SleepWalk categorizes tasks into three difficulty tiers to allow for detailed analysis of how models handle increasing spatial and temporal complexity, revealing significant failures in grounded spatial reasoning, particularly with multi-step instructions and occlusion. AI

影响 This benchmark will help advance grounded multimodal reasoning and the development of action-capable agents in 3D environments.

排序理由 The cluster describes a new academic benchmark paper for evaluating AI models. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

New SleepWalk benchmark tests AI's 3D navigation and instruction grounding

报道来源 [1]

  1. arXiv cs.CV TIER_1 English(EN) · Amitava Das ·

    SleepWalk: A Three-Tier Benchmark for Stress-Testing Instruction-Guided Vision-Language Navigation

    Vision-Language Models (VLMs) have advanced rapidly in multimodal perception and language understanding, yet it remains unclear whether they can reliably ground language into spatially coherent, plausibly executable actions in 3D digital environments. We introduce SleepWalk, a be…