Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs
Researchers have developed a novel backdoor attack called Turn-based Structural Triggers (TST) that exploits the dialogue structure of Large Language Models (LLMs) rather than user-visible prompts. This attack uses the turn index within a conversation as the trigger, allowing a backdoored model to execute malicious behaviors at specific points in a dialogue without any discernible input trigger. TST demonstrated a high attack success rate across multiple LLM families while maintaining normal performance on non-triggered tasks, highlighting a new vulnerability in multi-turn conversational AI systems. AI
IMPACT Reveals a new attack vector for LLMs, necessitating the development of structure-aware auditing methods beyond prompt inspection.