研究发现：AI 代理安全需要外部强制执行，而非内部拒绝

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-30 04:00

一篇新论文认为，当前确保 AI 代理安全的、侧重于拒绝不安全输入的的方法存在根本性缺陷。作者认为，代理危害源于授权权限与实际行使权限之间的不匹配，这是模型训练文本数据中缺失的属性。他们提出，必须通过一种外部强制执行的最小权限原则来实现行动安全，并将其评估为行动对齐，而不是简单的拒绝分数。 AI

影响当前 AI 代理的安全方法不足，需要转向外部、最小权限强制执行，以实现稳健的行动对齐。

排序理由该集群包含一篇讨论 AI 安全机制的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Shawn Li, Yue Zhao · 2026-06-30 04:00

Agent Safety Is Action Alignment

arXiv:2606.28739v1 Announce Type: new Abstract: Large language models increasingly act as agents: they call tools, move money, delete records, and send messages on a user's behalf. To keep them safe, practitioners imported the chatbot-era recipe (train the model to refuse unsafe …