LLM代理知道何时使用工具，但未能付诸行动

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-22 04:00

研究人员开发了一个名为When2Tool的新基准，用于评估大型语言模型（LLM）代理何时应使用外部工具。该基准显示，LLM对其内部的工具必要性具有理解能力，这种能力可以从其隐藏状态中检测到，但在生成过程中未能将这种知识付诸行动。一种名为Probe&Prefill的提议方法利用了这种内部信号，在准确性损失极小的情况下显著减少了不必要的工具调用，其表现优于现有基线。 AI

影响通过减少不必要的工具调用来提高LLM代理的效率，可能降低AI应用的成本和延迟。

排序理由该集群包含一篇学术论文，提出了一种评估LLM代理工具使用的新基准和方法。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Chung-En Sun, Linbo Liu, Ge Yan, Zimo Wang, Tsui-Wei Weng · 2026-05-22 04:00

LLM Agents Already Know When to Call Tools -- Even Without Reasoning

arXiv:2605.09252v2 Announce Type: replace Abstract: Tool-augmented LLM agents tend to call tools indiscriminately, even when the model can answer directly. Each unnecessary call wastes API fees and latency, yet no existing benchmark systematically studies when a tool call is actu…

报道来源 [1]

LLM Agents Already Know When to Call Tools -- Even Without Reasoning

相关实体

相关话题