Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 6d · [5 sources]

Hiding in Plain Floats: Steganographic Carriers for Indirect Prompt and Content Injection

Researchers have identified novel methods for embedding hidden messages within Large Language Models (LLMs) that bypass traditional text-based security measures. One technique involves transporting payloads as structured float parameters, which can evade detection even when text classifiers are in place. Another method exploits the pseudo-random number generators used in LLM inference to embed messages in the seeds, allowing for reconstruction of the secret from generated text alone. Furthermore, a study shows that even sophisticated internal activation probes designed to detect these hidden messages can be circumvented, though specific data-level interventions can restore detectability. AI

IMPACT Reveals new attack vectors for LLM security and highlights the need for more robust detection mechanisms beyond simple text analysis.

roberta-base
LLM
Prompt Guard 2 + TF-IDF
Ministral-8B
Llama-3.1-8B
TF-IDF
Phi-4-14B
Prompt Guard 2
Qwen3-8B
LLMs
Qwen3-14B