English(EN) Security--Fidelity Tradeoffs: The Hidden Cost of Prompt Injection Defense

新基准揭示了大型语言模型防御中的安全-保真度权衡

作者 PulseAugur 编辑部 · [1 个来源] · 2026-07-01 04:00

一个名为 SecFid 的新基准已被开发出来，用于衡量大型语言模型 (LLM) 在面对提示注入攻击时，安全性和保真度之间的权衡。研究人员发现，目前针对这些攻击的防御措施常常会损害模型忠实处理和保留信息的能力，尤其是在翻译或文档编辑等任务中。在大量示例和配置中，没有模型或防御措施能够同时实现高安全性和高保真度，最安全的防御措施会显著降低保真度，反之亦然。研究表明，最佳平衡取决于具体的部署环境以及安全漏洞和数据丢失的相对成本。 AI

影响凸显了部署 LLM 的一个关键安全挑战，影响了人工智能系统在实际应用中的可靠性。

排序理由学术论文，介绍了新的基准和对 LLM 安全性的分析。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Mitchell Hermon, Rahul Gupta, Weitong Ruan, Ekraam Sabir, Haohan Wang · 2026-07-01 04:00

Security--Fidelity Tradeoffs: The Hidden Cost of Prompt Injection Defense

arXiv:2606.30783v1 Announce Type: cross Abstract: We identify a security-fidelity tradeoff in defending LLMs against indirect prompt injection: defenses resist injected instructions largely by suppressing untrusted text, which corrupts tasks that must preserve it, such as transla…

报道来源 [1]

Security--Fidelity Tradeoffs: The Hidden Cost of Prompt Injection Defense

相关实体

相关话题