New PLAGUE framework boosts LLM jailbreak success rates

By PulseAugur Editorial · [1 sources] · 2026-06-09 04:00

Researchers have developed PLAGUE, a new framework for creating multi-turn jailbreak attacks against large language models. This framework mimics lifelong learning agents, breaking down attacks into three phases: priming, planning, and finishing. PLAGUE has demonstrated significant success, improving attack success rates by over 30% on models like OpenAI's o3 and Anthropic's Claude Opus 4.1, which are known for their resistance to such exploits. AI

IMPACT This research highlights vulnerabilities in advanced LLMs, potentially guiding developers in strengthening safety measures against sophisticated multi-turn exploits.

RANK_REASON Academic paper detailing a new framework for LLM attacks. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

paper
safety

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Neeladri Bhuiya, Madhav Aggarwal, Diptanshu Purwar · 2026-06-09 04:00

PLAGUE: Plug-and-play framework for Lifelong Adaptive Generation of Multi-turn Exploits

arXiv:2510.17947v3 Announce Type: replace-cross Abstract: Large Language Models (LLMs) are improving at an exceptional rate. With the advent of agentic workflows, multi-turn dialogue has become the de facto mode of interaction with LLMs for completing long and complex tasks. Whil…

COVERAGE [1]

PLAGUE: Plug-and-play framework for Lifelong Adaptive Generation of Multi-turn Exploits

RELATED ENTITIES

RELATED TOPICS