New attack hijacks LLM auto-optimization by preserving semantics

By PulseAugur Editorial · [1 sources] · 2026-06-16 04:00

Researchers have developed a new black-box adversarial attack method called Adaptive Greedy Local Search, designed to hijack the auto-suggestion optimization modules within large language models. This technique works by subtly altering user input to cause semantic shifts in the model's output while maintaining a high degree of semantic similarity to the original text. Experiments on various LLMs indicate that this method is more successful than existing approaches in achieving its attack goals under similar semantic constraints. AI

IMPACT Highlights a vulnerability in LLM auto-optimization features, potentially impacting model security and trustworthiness.

RANK_REASON Academic paper detailing a new adversarial attack method on LLMs. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

paper
safety

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Chong Zhang, Xiang Li, Jia Wang, Shan Liang, Haochen Xue, Xiaobo Jin · 2026-06-16 04:00

Semantic-Preserving Prompt Hijacking: A Black-Box Adversarial Attack on Auto-Prompt Optimization

arXiv:2506.18756v2 Announce Type: replace Abstract: LLMs increasingly integrate auto-suggestion optimization modules, enabling them to rewrite and display user input before generating the final response. While this design aims to enhance transparency and trust, its process of aut…

COVERAGE [1]

Semantic-Preserving Prompt Hijacking: A Black-Box Adversarial Attack on Auto-Prompt Optimization

RELATED ENTITIES

RELATED TOPICS