PulseAugur
实时 04:43:45
English(EN) S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence

S-Agent框架增强VLMs进行3D空间推理 · 跟踪4个来源

研究人员推出S-Agent,一个旨在增强视觉语言模型(VLMs)在3D环境中进行空间推理的新框架。S-Agent整合了时间记忆和一系列空间工具,能够从多视图图像中持续理解3D世界,超越了静态、帧级别的分析。该框架允许VLMs充当语义规划器,决定需要什么证据,而空间工具则将物体定位在2D,将其提升到3D,并将这些信息聚合为空间知识。实验表明,S-Agent在无需重新训练的情况下就能改进开源和闭源VLMs,并且经过微调的版本S-Agent-8B,其性能可与GPT-5.4和Gemini 3等先进模型相媲美。 AI

影响 该框架可能显著提高AI理解和与3D环境交互的能力,对机器人技术、自主系统和虚拟现实产生影响。

排序理由 该集群报道了一篇关于AI模型空间推理新颖框架的最新研究论文。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 7 个来源。 我们如何撰写摘要 →

S-Agent框架增强VLMs进行3D空间推理 · 跟踪4个来源

报道来源 [7]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    S-Agent:空间工具使用引发空间智能推理

    S-Agent is a spatial reasoning framework that enhances visual language models with temporal memory and hierarchical spatial tools to enable continuous 3D world understanding from multi-view imagery.

  2. arXiv cs.CV TIER_1 English(EN) · Yalun Dai, Hao Li, Shulin Tian, Runmao Yao, Yuhao Dong, Fangzhou Hong, Zhaoxi Chen, Fangfu Liu, Baoliang Tian, Dingwen Zhang, Tao Wang, Kim-Hui Yap, Ziwei Liu ·

    S-Agent:空间工具使用引发空间智能推理

    arXiv:2606.20515v1 Announce Type: new Abstract: Real-world spatial intelligence requires reasoning over a continuous and evolving 3D world, yet existing VLMs and tool-augmented agents largely remain tied to static, stateless inference from isolated visual observations. We introdu…

  3. arXiv cs.CV TIER_1 English(EN) · Ziwei Liu ·

    S-Agent:空间工具使用引发空间智能推理

    Real-world spatial intelligence requires reasoning over a continuous and evolving 3D world, yet existing VLMs and tool-augmented agents largely remain tied to static, stateless inference from isolated visual observations. We introduce \textbf{\textsc{S-Agent}}, a spatial tool-use…

  4. MarkTechPost TIER_1 English(EN) · Asif Razzaq ·

    NVIDIA AI 推出 SpatialClaw:一种将代码视为空间推理操作界面的无训练智能体

    <p>SpatialClaw is a training-free agent that writes Python in a persistent kernel, composing perception tools for 3D spatial reasoning</p> <p>The post <a href="https://www.marktechpost.com/2026/06/19/nvidia-ai-introduce-spatialclaw-a-training-free-agent-that-treats-code-as-the-ac…

  5. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    🤖 NVIDIA的SpatialClaw将VLMs的空间推理能力提升了11.2个百分点 NVIDIA的SpatialClaw框架提高了视觉语言模型在空间推理方面的准确性

    🤖 NVIDIA's SpatialClaw boosts spatial reasoning in VLMs by 11.2 points NVIDIA's SpatialClaw framework has increased spatial reasoning accuracy in vision language models by 11.2 points over SpaceTools, reaching 59.9% average accuracy across 20 benchmarks. This new training free fr…

  6. Mastodon — mastodon.social TIER_1 English(EN) · [email protected] ·

    NVIDIA 的 SpatialClaw 是一个无需训练的框架,用于空间推理,将代码视为操作接口。在 20 个基准测试中,准确率达到 59.9%

    NVIDIA's SpatialClaw is a training-free framework for spatial reasoning that treats code as the action interface. Across 20 benchmarks it reaches 59.9% accuracy, outperforming SpaceTools by 11.2 points. https://www. marktechpost.com/2026/06/19/nv idia-ai-introduce-spatialclaw-a-t…

  7. Mastodon — mastodon.social TIER_1 English(EN) · [email protected] ·

    NVIDIA发布SpatialClaw,一款将代码视为空间推理动作界面的无训练AI代理。它使用Python内核来组合感知

    NVIDIA has unveiled SpatialClaw, a training-free AI agent that treats code as the action interface for spatial reasoning. Using a Python kernel to compose perception tools, it achieves 59.9% accuracy across 20 benchmarks - outperforming prior approaches by over 11 points. https:/…