Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 12h

Dive into Ambiguity: A*-Inspired Multi-Agents Commonsense Obfuscation Attack on LLM Prompts

Researchers have developed a new method for attacking large language models (LLMs) by generating semantically similar but intentionally ambiguous prompts. This A*-inspired framework uses a hierarchical rewrite strategy to gradually obfuscate prompts, aiming to induce commonsense hallucinations while preserving the original intent. The approach has demonstrated higher attack success rates and greater efficiency compared to previous methods across various LLMs. AI

IMPACT This research highlights a critical vulnerability in LLMs, potentially impacting their deployment in safety-critical applications and driving the development of more robust defense mechanisms.

LLMs
Agentic Mechanism Labeling
Hierarchical Rewrite Strategy
A*-inspired Factual Error Induction Framework