Brief · PulseAugur

TOOL · arXiv cs.CL English(EN) · 7h

Semantic-Preserving Prompt Hijacking: A Black-Box Adversarial Attack on Auto-Prompt Optimization

Researchers have developed a new black-box adversarial attack method called Adaptive Greedy Local Search, designed to hijack the auto-suggestion optimization modules within large language models. This technique works by subtly altering user input to cause semantic shifts in the model's output while maintaining a high degree of semantic similarity to the original text. Experiments on various LLMs indicate that this method is more successful than existing approaches in achieving its attack goals under similar semantic constraints. AI

IMPACT Highlights a vulnerability in LLM auto-optimization features, potentially impacting model security and trustworthiness.

Hugging Face
arXiv
DagsHub
alphaXiv
ScienceCast
CatalyzeX
Gotit.pub
Adaptive Greedy Local Search
Chong Zhang