English(EN) Beyond APIs: Probing the Limits of MLLMs in Physical Tool Use

新基准揭示多模态大语言模型在物理工具使用方面存在困难

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-09 12:49

研究人员推出 PhysTool-Bench，这是一个旨在评估多模态大语言模型（MLLMs）理解和使用物理工具能力的新基准。该基准包含超过 2,500 个查询，涉及各行各业近 2,700 种真实世界的工具。对 13 个领先 MLLMs 的初步测试显示存在显著局限性，表现最佳的模型仅能正确识别 58.7% 的工具并完成 21.0% 的任务，这凸显了它们在感知和功能性推理物理对象以实现具身人工智能应用方面存在的关键差距。 AI

影响强调了 MLLMs 在物理世界交互方面的关键局限性，表明需要改进具身人工智能的感知和功能常识。

排序理由该集群包含一篇介绍用于评估 MLLMs 的新基准的研究论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Wenjie Li · 2026-06-09 12:49

超越API：探究MLLM在物理工具使用中的极限

Multimodal Large Language Models (MLLMs) excel at utilizing digital APIs and increasingly serve as the "brain" of embodied AI, instructing robots to interact with the physical world. In such embodied settings, a central capability is the use of physical tools, which underpins MLL…

报道来源 [1]

超越API：探究MLLM在物理工具使用中的极限

相关实体

相关话题