I tested 5 LLMs for prompt-injection leaks. Same code, 0% to 90%.
A security researcher tested five large language models (LLMs) for prompt injection vulnerabilities, finding that leak rates varied significantly from 0% to 90% depending on the model used. The tests revealed that disguised prompts, phrased as legitimate requests, were more effective at eliciting sensitive information like API keys or system prompts than blunt injection attempts. Notably, while Anthropic's Claude Haiku 4.5 showed no key leaks, it had a 90% rate of disclosing its system prompt content, highlighting the need for multi-stage detection methods. AI
IMPACT Highlights critical security risks in LLM agents and the need for robust, multi-stage detection mechanisms before deployment.