Researchers have developed ContextualJailbreak, an evolutionary red-teaming strategy designed to find vulnerabilities in large language models. This black-box approach uses simulated multi-turn dialogues and a graded harm score to guide its search for jailbreak attacks. The method achieved 100% attack success rates on several open-source models and demonstrated significant transferability to closed frontier models, though with notable differences in robustness across providers. AI
影响 This research highlights new attack vectors against LLMs, potentially influencing future safety alignment strategies and model development.
排序理由 The cluster contains an arXiv paper detailing a new method for red-teaming LLMs.
- claude-opus-4-7
- claude-sonnet-4-6
- ContextualJailbreak
- gemini-3-flash
- gpt-4o-mini
- gpt-5
- gpt-oss:120B
- gpt-oss:20B
- HarmBench
- llama3.1:70B
- Mario Rodriguez Bejar
- qwen3-8B
AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →