English(EN) Audio Jailbreaks in Large Audio-Language Models: Taxonomy, Attack-Defense Analysis, and Cost-Aware Evaluation

新研究揭示了 LLM 和 LALM 的越狱漏洞不断升级

作者 PulseAugur 编辑部 · [4 个来源] · 2026-05-28 14:53

三篇新研究论文探讨了大型语言模型 (LLM) 和大型音频语言模型 (LALM) 的漏洞和防御。第一篇论文详细介绍了音频越狱攻击和防御的分类法，强调目前的防御措施通常会牺牲可用性来换取鲁棒性。第二篇论文全面回顾了 LLM 的漏洞，对攻击和防御进行了分类，并指出了在弹性对齐和自动检测等领域的研究空白。第三篇论文介绍了“越狱规模定律”，证明了对抗性提示如何将攻击成功率从多项式增长转变为指数增长，这种现象在各种 LLM 和攻击方法中都有观察到。 AI

影响新研究强调了 LLM 和 LALM 安全方面不断升级的风险，并强调需要更强大、更易于使用的防御措施来应对复杂的越狱技术。

排序理由该集群包含三篇学术论文，详细介绍了对 LLM 和 LALM 漏洞及防御的研究。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。我们如何撰写摘要 →

报道来源 [4]

arXiv cs.AI TIER_1 English(EN) · Bo-Han Feng, Yu-Hsuan Li Liang, Chien-Feng Liu, You-Hsuan Chang, Yun-Nung Chen · 2026-05-29 04:00

大型音频语言模型中的音频越狱：分类、攻防分析及成本感知评估

arXiv:2605.30031v1 Announce Type: cross Abstract: Large Audio Language Models (LALMs) expand jailbreak risks from token-level prompting to the full speech perception-to-reasoning pipeline, where unsafe behavior can be induced through semantics, acoustic style, signal artifacts, o…
arXiv cs.AI TIER_1 English(EN) · Benji Peng, Hanxuan Chen, Keyu Chen, Qian Niu, Ziqian Bi, Ming Liu, Pohsun Feng, Tianyang Wang, Lawrence K. Q. Yan, Yizhu Wen, Yichao Zhang, Caitlyn Heqi Yin, Xinyuan Song, Riyang Bao, Jiacheng Shi · 2026-05-29 04:00

大型语言模型中的越狱与漏洞缓解

arXiv:2410.15236v4 Announce Type: replace-cross Abstract: Large Language Models (LLMs) have transformed artificial intelligence by advancing natural language understanding and generation, enabling applications across fields beyond healthcare, software engineering, and conversatio…
arXiv cs.AI TIER_1 English(EN) · Indranil Halder, Annesya Banerjee, Cengiz Pehlevan · 2026-05-29 04:00

大型语言模型的越狱缩放定律：多项式-指数交叉

arXiv:2603.11331v3 Announce Type: replace-cross Abstract: Adversarial attacks can reliably steer safety-aligned large language models toward unsafe behavior. Empirically, we find that adversarial prompt-injection attacks can amplify attack success rate from the slow polynomial gr…
arXiv cs.AI TIER_1 English(EN) · Yun-Nung Chen · 2026-05-28 14:53

大型音频语言模型中的音频越狱：分类、攻防分析及成本感知评估

Large Audio Language Models (LALMs) expand jailbreak risks from token-level prompting to the full speech perception-to-reasoning pipeline, where unsafe behavior can be induced through semantics, acoustic style, signal artifacts, or internal representations. Existing work studies …

报道来源 [4]

大型音频语言模型中的音频越狱：分类、攻防分析及成本感知评估

大型语言模型中的越狱与漏洞缓解

大型语言模型的越狱缩放定律：多项式-指数交叉

大型音频语言模型中的音频越狱：分类、攻防分析及成本感知评估

相关实体

相关话题