Researchers have developed PLAGUE, a new framework for creating multi-turn jailbreak attacks against large language models. This framework mimics lifelong learning agents, breaking down attacks into three phases: priming, planning, and finishing. PLAGUE has demonstrated significant success, improving attack success rates by over 30% on models like OpenAI's o3 and Anthropic's Claude Opus 4.1, which are known for their resistance to such exploits. AI
IMPACT This research highlights vulnerabilities in advanced LLMs, potentially guiding developers in strengthening safety measures against sophisticated multi-turn exploits.
RANK_REASON Academic paper detailing a new framework for LLM attacks. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →