Evaluating Prompting-Based Defenses Against Domain-Camouflaged Injection Attacks
A new research paper evaluates five prompting-based defenses against domain-camouflaged injection attacks, which embed malicious instructions using domain-appropriate vocabulary to evade standard detectors. The study tested these defenses across Claude Haiku, Llama 3.1 8B, and Gemini 2.0 Flash models in financial, legal, and general domains, utilizing 3,510 trials. Paraphrasing retrieved content emerged as the most effective defense, reducing attack success rates by 55-84% and outperforming Llama Guard 4 configurations. Defense efficacy varied significantly by model, with spotlighting proving effective for Claude Haiku but not Llama 3.1 8B, and financial domain deployments showing the highest residual risk. AI
IMPACT Establishes benchmark-based recommendations for practitioners to defend against sophisticated AI injection attacks.