PulseAugur
实时 14:47:07
English(EN) Hiding in Plain Floats: Steganographic Carriers for Indirect Prompt and Content Injection

新的LLM隐写术方法绕过文本、激活防御

研究人员发现了一种在大型语言模型(LLM)中嵌入隐藏消息的新颖方法,该方法可以绕过传统的基于文本的安全措施。一种技术涉及将有效载荷作为结构化浮点参数进行传输,即使存在文本分类器也能逃避检测。另一种方法利用LLM推理中使用的伪随机数生成器,将消息嵌入到种子中,从而仅凭生成的文本就可以重建秘密。此外,一项研究表明,即使是旨在检测这些隐藏消息的复杂的内部激活探测也可以被规避,尽管特定的数据级干预可以恢复可检测性。 AI

影响 揭示了LLM安全的新攻击向量,并强调需要超越简单文本分析的更强大的检测机制。

排序理由 多篇研究论文详细介绍了LLM内的隐写术新方法及其防御措施。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。 我们如何撰写摘要 →

报道来源 [5]

  1. arXiv cs.AI TIER_1 English(EN) · Mudit Sinha, Sanika Chavan ·

    隐藏在普通浮点数中:用于间接提示和内容注入的隐写载体

    arXiv:2606.08403v1 Announce Type: cross Abstract: Text-centered prompt-injection defenses assume that the malicious signal is visible in one of the inspected text views. We study a reproducible LLM01-style indirect prompt/content-injection failure mode where that assumption break…

  2. arXiv cs.AI TIER_1 English(EN) · Felix M\"achtle, Jonas Sander, Sebastian Berndt, Ben Weimar, Nils Loose, Thomas Eisenbarth ·

    无需修改的隐写术:通过LLM种子进行隐藏通信

    arXiv:2606.09135v1 Announce Type: cross Abstract: We demonstrate that widely deployed Large Language Model (LLM) inference stacks harbor a steganographic channel that requires no modification to model weights, sampling code, or output distributions. The channel exploits a structu…

  3. arXiv cs.LG TIER_1 English(EN) · Charles Westphal, Timothy Douglas, Keivan Navaie, Tiago Pimentel, Fernando E. Rosas ·

    你现在(仍然)能看见我:检测大型语言模型中规避性隐写术载荷

    arXiv:2606.09411v1 Announce Type: cross Abstract: Large language models can be fine-tuned to encode prompt-borne secrets into fluent, seemingly benign outputs. This creates a steganographic exfiltration risk that is difficult to detect with output-level steganalysis. Recent work …

  4. arXiv cs.LG TIER_1 English(EN) · Fernando E. Rosas ·

    你现在(仍然)能看见我:检测大型语言模型中逃避型隐写术载荷

    Large language models can be fine-tuned to encode prompt-borne secrets into fluent, seemingly benign outputs. This creates a steganographic exfiltration risk that is difficult to detect with output-level steganalysis. Recent work proposes mechanistic detection using linear probes…

  5. arXiv cs.AI TIER_1 English(EN) · Sanika Chavan ·

    隐藏在普通浮点数中:用于间接提示和内容注入的隐写载体

    Text-centered prompt-injection defenses assume that the malicious signal is visible in one of the inspected text views. We study a reproducible LLM01-style indirect prompt/content-injection failure mode where that assumption breaks: a payload caught in plain English slips past th…