Researchers have developed PLAGUE, a new framework for creating multi-turn jailbreak attacks against large language models. This framework mimics lifelong learning agents, breaking down attacks into three phases: priming, planning, and finishing. PLAGUE has demonstrated significant success, improving attack success rates by over 30% on models like OpenAI's o3 and Anthropic's Claude Opus 4.1, which are known for their resistance to such exploits. AI
影响 This research highlights vulnerabilities in advanced LLMs, potentially guiding developers in strengthening safety measures against sophisticated multi-turn exploits.
排序理由 Academic paper detailing a new framework for LLM attacks. [lever_c_demoted from research: ic=1 ai=1.0]
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →