New LLM steganography methods bypass text, activation defenses

By PulseAugur Editorial · [5 sources] · 2026-06-07 01:41

Researchers have identified novel methods for embedding hidden messages within Large Language Models (LLMs) that bypass traditional text-based security measures. One technique involves transporting payloads as structured float parameters, which can evade detection even when text classifiers are in place. Another method exploits the pseudo-random number generators used in LLM inference to embed messages in the seeds, allowing for reconstruction of the secret from generated text alone. Furthermore, a study shows that even sophisticated internal activation probes designed to detect these hidden messages can be circumvented, though specific data-level interventions can restore detectability. AI

IMPACT Reveals new attack vectors for LLM security and highlights the need for more robust detection mechanisms beyond simple text analysis.

RANK_REASON Multiple research papers detailing novel methods for steganography within LLMs and defenses against them.

Read on arXiv cs.AI →

paper
safety

AI-generated summary · Google Gemini · from 5 sources. How we write summaries →

New LLM steganography methods bypass text, activation defenses

COVERAGE [5]

arXiv cs.AI TIER_1 English(EN) · Mudit Sinha, Sanika Chavan · 2026-06-09 04:00

Hiding in Plain Floats: Steganographic Carriers for Indirect Prompt and Content Injection

arXiv:2606.08403v1 Announce Type: cross Abstract: Text-centered prompt-injection defenses assume that the malicious signal is visible in one of the inspected text views. We study a reproducible LLM01-style indirect prompt/content-injection failure mode where that assumption break…
arXiv cs.AI TIER_1 English(EN) · Felix M\"achtle, Jonas Sander, Sebastian Berndt, Ben Weimar, Nils Loose, Thomas Eisenbarth · 2026-06-09 04:00

Steganography Without Modification: Hidden Communication via LLM Seeds

arXiv:2606.09135v1 Announce Type: cross Abstract: We demonstrate that widely deployed Large Language Model (LLM) inference stacks harbor a steganographic channel that requires no modification to model weights, sampling code, or output distributions. The channel exploits a structu…
arXiv cs.LG TIER_1 English(EN) · Charles Westphal, Timothy Douglas, Keivan Navaie, Tiago Pimentel, Fernando E. Rosas · 2026-06-09 04:00

Now You (Still) See Me: Detecting Evasive Steganographic Payloads in LLMs

arXiv:2606.09411v1 Announce Type: cross Abstract: Large language models can be fine-tuned to encode prompt-borne secrets into fluent, seemingly benign outputs. This creates a steganographic exfiltration risk that is difficult to detect with output-level steganalysis. Recent work …
arXiv cs.LG TIER_1 English(EN) · Fernando E. Rosas · 2026-06-08 12:27

Now You (Still) See Me: Detecting Evasive Steganographic Payloads in LLMs

Large language models can be fine-tuned to encode prompt-borne secrets into fluent, seemingly benign outputs. This creates a steganographic exfiltration risk that is difficult to detect with output-level steganalysis. Recent work proposes mechanistic detection using linear probes…
arXiv cs.AI TIER_1 English(EN) · Sanika Chavan · 2026-06-07 01:41

Hiding in Plain Floats: Steganographic Carriers for Indirect Prompt and Content Injection

Text-centered prompt-injection defenses assume that the malicious signal is visible in one of the inspected text views. We study a reproducible LLM01-style indirect prompt/content-injection failure mode where that assumption breaks: a payload caught in plain English slips past th…

COVERAGE [5]

Hiding in Plain Floats: Steganographic Carriers for Indirect Prompt and Content Injection

Steganography Without Modification: Hidden Communication via LLM Seeds

Now You (Still) See Me: Detecting Evasive Steganographic Payloads in LLMs

Now You (Still) See Me: Detecting Evasive Steganographic Payloads in LLMs

Hiding in Plain Floats: Steganographic Carriers for Indirect Prompt and Content Injection

RELATED ENTITIES

RELATED TOPICS