PLAGUE: Plug-and-play framework for Lifelong Adaptive Generation of Multi-turn Exploits
Researchers have developed PLAGUE, a new framework for creating multi-turn jailbreak attacks against large language models. This framework mimics lifelong learning agents, breaking down attacks into three phases: priming, planning, and finishing. PLAGUE has demonstrated significant success, improving attack success rates by over 30% on models like OpenAI's o3 and Anthropic's Claude Opus 4.1, which are known for their resistance to such exploits. AI
IMPACT This research highlights vulnerabilities in advanced LLMs, potentially guiding developers in strengthening safety measures against sophisticated multi-turn exploits.