English(EN)S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence
S-Agent框架增强VLMs进行3D空间推理 · 跟踪4个来源
作者PulseAugur 编辑部·[7 个来源]·
研究人员推出S-Agent,一个旨在增强视觉语言模型(VLMs)在3D环境中进行空间推理的新框架。S-Agent整合了时间记忆和一系列空间工具,能够从多视图图像中持续理解3D世界,超越了静态、帧级别的分析。该框架允许VLMs充当语义规划器,决定需要什么证据,而空间工具则将物体定位在2D,将其提升到3D,并将这些信息聚合为空间知识。实验表明,S-Agent在无需重新训练的情况下就能改进开源和闭源VLMs,并且经过微调的版本S-Agent-8B,其性能可与GPT-5.4和Gemini 3等先进模型相媲美。
AI
S-Agent is a spatial reasoning framework that enhances visual language models with temporal memory and hierarchical spatial tools to enable continuous 3D world understanding from multi-view imagery.
arXiv:2606.20515v1 Announce Type: new Abstract: Real-world spatial intelligence requires reasoning over a continuous and evolving 3D world, yet existing VLMs and tool-augmented agents largely remain tied to static, stateless inference from isolated visual observations. We introdu…
Real-world spatial intelligence requires reasoning over a continuous and evolving 3D world, yet existing VLMs and tool-augmented agents largely remain tied to static, stateless inference from isolated visual observations. We introduce \textbf{\textsc{S-Agent}}, a spatial tool-use…
<p>SpatialClaw is a training-free agent that writes Python in a persistent kernel, composing perception tools for 3D spatial reasoning</p> <p>The post <a href="https://www.marktechpost.com/2026/06/19/nvidia-ai-introduce-spatialclaw-a-training-free-agent-that-treats-code-as-the-ac…
🤖 NVIDIA's SpatialClaw boosts spatial reasoning in VLMs by 11.2 points NVIDIA's SpatialClaw framework has increased spatial reasoning accuracy in vision language models by 11.2 points over SpaceTools, reaching 59.9% average accuracy across 20 benchmarks. This new training free fr…
NVIDIA's SpatialClaw is a training-free framework for spatial reasoning that treats code as the action interface. Across 20 benchmarks it reaches 59.9% accuracy, outperforming SpaceTools by 11.2 points. https://www. marktechpost.com/2026/06/19/nv idia-ai-introduce-spatialclaw-a-t…
NVIDIA has unveiled SpatialClaw, a training-free AI agent that treats code as the action interface for spatial reasoning. Using a Python kernel to compose perception tools, it achieves 59.9% accuracy across 20 benchmarks - outperforming prior approaches by over 11 points. https:/…