Hiding in Plain Floats: Steganographic Carriers for Indirect Prompt and Content Injection
Researchers have identified novel methods for embedding hidden messages within Large Language Models (LLMs) that bypass traditional text-based security measures. One technique involves transporting payloads as structured float parameters, which can evade detection even when text classifiers are in place. Another method exploits the pseudo-random number generators used in LLM inference to embed messages in the seeds, allowing for reconstruction of the secret from generated text alone. Furthermore, a study shows that even sophisticated internal activation probes designed to detect these hidden messages can be circumvented, though specific data-level interventions can restore detectability. AI
IMPACT Reveals new attack vectors for LLM security and highlights the need for more robust detection mechanisms beyond simple text analysis.