Dive into Ambiguity: A*-Inspired Multi-Agents Commonsense Obfuscation Attack on LLM Prompts
Researchers have developed a new method for attacking large language models (LLMs) by generating semantically similar but intentionally ambiguous prompts. This A*-inspired framework uses a hierarchical rewrite strategy to gradually obfuscate prompts, aiming to induce commonsense hallucinations while preserving the original intent. The approach has demonstrated higher attack success rates and greater efficiency compared to previous methods across various LLMs. AI
IMPACT This research highlights a critical vulnerability in LLMs, potentially impacting their deployment in safety-critical applications and driving the development of more robust defense mechanisms.