English(EN) The prompt injection attacks that worry me most aren't exploiting safety training. They're exploiting general-purpose training.

提示注入攻击利用通用AI训练，而非安全防护

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-09 10:23

一位安全研究人员观察到，最有效的AI模型提示注入攻击利用的是其通用训练，而非特定的安全对齐。这些攻击利用模型固有的乐于助人和对话连贯性，通过重构情境来欺骗模型违背用户意图。研究人员认为，改进对齐可能无法有效应对这些威胁，因为漏洞存在于使模型具有对话能力和乐于助人的核心训练中。 AI

影响建议AI安全重点从对齐转向核心训练方法，以应对提示注入。

排序理由该集群包含一篇研究人员的观点文章，讨论AI安全和提示注入漏洞。

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/OpenAI TIER_2 English(EN) · /u/BordairAPI · 2026-06-09 10:23

The prompt injection attacks that worry me most aren't exploiting safety training. They're exploiting general-purpose training.

<div class="md">Six months watching adversarial input hit a detection API I built. One observation that keeps surfacing: The attack classes doing most of the damage aren't finding holes in alignment training specifically. They're using gener…