Researchers have developed an adaptive evolutionary framework called AE-CoT to jailbreak large reasoning models (LRMs). This method rewrites harmful goals into mild prompts and decomposes them into reasoning fragments to create jailbreak candidates. The framework then uses evolutionary search with crossover and mutation strategies to expand candidate diversity, and an independent scoring model evaluates harmfulness to enhance destructive generations. Experiments show AE-CoT outperforms existing jailbreak methods across multiple models and datasets. AI
影响 This research highlights new vulnerabilities in LLMs, potentially impacting their safe deployment and prompting further research into robust defense mechanisms.
排序理由 The cluster contains an academic paper detailing a new method for jailbreaking LLMs. [lever_c_demoted from research: ic=1 ai=1.0]
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →