PulseAugur
EN
LIVE 21:18:05

New LLM steganography methods bypass text, activation defenses

Researchers have identified novel methods for embedding hidden messages within Large Language Models (LLMs) that bypass traditional text-based security measures. One technique involves transporting payloads as structured float parameters, which can evade detection even when text classifiers are in place. Another method exploits the pseudo-random number generators used in LLM inference to embed messages in the seeds, allowing for reconstruction of the secret from generated text alone. Furthermore, a study shows that even sophisticated internal activation probes designed to detect these hidden messages can be circumvented, though specific data-level interventions can restore detectability. AI

IMPACT Reveals new attack vectors for LLM security and highlights the need for more robust detection mechanisms beyond simple text analysis.

RANK_REASON Multiple research papers detailing novel methods for steganography within LLMs and defenses against them.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 5 sources. How we write summaries →

COVERAGE [5]

  1. arXiv cs.AI TIER_1 English(EN) · Mudit Sinha, Sanika Chavan ·

    Hiding in Plain Floats: Steganographic Carriers for Indirect Prompt and Content Injection

    arXiv:2606.08403v1 Announce Type: cross Abstract: Text-centered prompt-injection defenses assume that the malicious signal is visible in one of the inspected text views. We study a reproducible LLM01-style indirect prompt/content-injection failure mode where that assumption break…

  2. arXiv cs.AI TIER_1 English(EN) · Felix M\"achtle, Jonas Sander, Sebastian Berndt, Ben Weimar, Nils Loose, Thomas Eisenbarth ·

    Steganography Without Modification: Hidden Communication via LLM Seeds

    arXiv:2606.09135v1 Announce Type: cross Abstract: We demonstrate that widely deployed Large Language Model (LLM) inference stacks harbor a steganographic channel that requires no modification to model weights, sampling code, or output distributions. The channel exploits a structu…

  3. arXiv cs.LG TIER_1 English(EN) · Charles Westphal, Timothy Douglas, Keivan Navaie, Tiago Pimentel, Fernando E. Rosas ·

    Now You (Still) See Me: Detecting Evasive Steganographic Payloads in LLMs

    arXiv:2606.09411v1 Announce Type: cross Abstract: Large language models can be fine-tuned to encode prompt-borne secrets into fluent, seemingly benign outputs. This creates a steganographic exfiltration risk that is difficult to detect with output-level steganalysis. Recent work …

  4. arXiv cs.LG TIER_1 English(EN) · Fernando E. Rosas ·

    Now You (Still) See Me: Detecting Evasive Steganographic Payloads in LLMs

    Large language models can be fine-tuned to encode prompt-borne secrets into fluent, seemingly benign outputs. This creates a steganographic exfiltration risk that is difficult to detect with output-level steganalysis. Recent work proposes mechanistic detection using linear probes…

  5. arXiv cs.AI TIER_1 English(EN) · Sanika Chavan ·

    Hiding in Plain Floats: Steganographic Carriers for Indirect Prompt and Content Injection

    Text-centered prompt-injection defenses assume that the malicious signal is visible in one of the inspected text views. We study a reproducible LLM01-style indirect prompt/content-injection failure mode where that assumption breaks: a payload caught in plain English slips past th…