English(EN) Evaluating using Mock Tool Calls to Quarantine Untrusted Prompt Inputs

研究表明模拟工具调用未能提高 LLM 提示的鲁棒性

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-05 22:43

研究人员探索了使用模拟工具调用来隔离 LLM 提示中的不可信输入，旨在提高鲁棒性。他们的研究在 ICML 的一个研讨会上发表，测试了该方法在三个任务和七个模型上的表现。与预期相反，模拟工具包装方法并未持续提高性能，在某些情况下甚至导致结果变差，尤其是在对抗性任务上。 AI

影响这项研究表明，一种用于提高 LLM 提示安全性的提议方法可能无效，凸显了需要更好的原始方法来处理不可信输入。

排序理由这是一篇研究笔记/研讨会论文，展示了关于 LLM 提示鲁棒性的实验结果。[lever_c_demoted from research: ic=1 ai=1.0]

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

LessWrong (AI tag) TIER_1 English(EN) · dgros · 2026-06-05 22:43

使用模拟工具调用评估以隔离不受信任的提示输入

<p><em>This is a small study that explores using tool calls to wrap untrusted parts of prompts. OpenAI's model spec considers tool results the least trusted kind of input. If tool-wrapping helped, it would be an easy way to improve robustness while using existing APIs models alre…