PulseAugur
实时 00:57:53
English(EN) Same Model, Different Weakness: How Language and Modality Reshape the Jailbreak Attack Surface in Frontier MLLMs

多模态大语言模型越狱漏洞因语言和模态而异

一项新研究表明,前沿多模态大语言模型(MLLMs)对越狱攻击的脆弱性显著受到语言和模态的影响。研究人员发现,与英语相比,西班牙语中的语言框架攻击效果较差,而视觉上明确的多模态攻击则更有效。这表明对齐失败通过不同的特定于语言和模态的机制运作,导致不同语言的安全排名不同。研究结果强调,安全评估框架需要考虑这些跨语言和跨模态的差异。 AI

影响 证明了当前的安全性评估可能无法跨语言推广,需要重新设计的框架来支持全球多模态大语言模型的部署。

排序理由 该集群包含一篇详细介绍大语言模型安全新研究的学术论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。 我们如何撰写摘要 →

报道来源 [5]

  1. arXiv cs.AI TIER_1 English(EN) · Seokil Ham, Jaehyuk Jang, Wonjun Lee, Changick Kim ·

    越狱以保护:通过临时越狱进行缓冲和加固,以实现大型语言模型的安全微调

    arXiv:2605.24550v1 Announce Type: new Abstract: Fine-tuning-as-a-Service (FaaS) enables personalization of large language models (LLMs), but it can weaken safety-alignment under harmful fine-tuning attacks. Recent work has shown that activating harmful-behavior modules during fin…

  2. arXiv cs.AI TIER_1 English(EN) · Xiaodong Wu, Xiangman Li, Qi Li, Lingshuang Liu, Jianbing Ni ·

    SoK:GPT和DeepSeek模型越狱韧性的全面安全分析

    arXiv:2506.18543v2 Announce Type: replace-cross Abstract: The rapid proliferation of Large Language Models (LLMs) has heightened concerns regarding their exposure to jailbreak attacks, which craft adversarial inputs designed to elicit unsafe content. Although proprietary models s…

  3. arXiv cs.AI TIER_1 English(EN) · Mengqi He, Xinyu Tian, Xin Shen, Shu Zou, Jinhong Ni, Zhaoyuan Yang, Weikang Li, Xuesong Li, Jing Zhang ·

    打破刹车,而非车轮:通过熵最大化实现无目标越狱

    arXiv:2605.10764v2 Announce Type: replace-cross Abstract: Recent studies show that gradient-based universal image jailbreaks on vision-language models (VLMs) exhibit little or no cross-model transferability, casting doubt on the feasibility of transferable multimodal jailbreaks. …

  4. arXiv cs.CL TIER_1 English(EN) · Casey Ford, Madison Van Doren, Sicheng Jin, Emily Dix ·

    相同模型,不同弱点:语言和模态如何重塑前沿大型多模态模型的越狱攻击面

    arXiv:2605.23157v1 Announce Type: new Abstract: The attack surface of a multimodal large language model (MLLM) is language-dependent in ways that reveal the mechanistic structure of alignment failures. We present the first systematic cross-lingual, multimodal red-teaming study co…

  5. arXiv cs.CL TIER_1 English(EN) · Emily Dix ·

    同一模型,不同弱点:语言和模态如何重塑前沿大型多模态模型的越狱攻击面

    The attack surface of a multimodal large language model (MLLM) is language-dependent in ways that reveal the mechanistic structure of alignment failures. We present the first systematic cross-lingual, multimodal red-teaming study comparing jailbreak vulnerability in US English (e…