PulseAugur
实时 13:47:38
English(EN) Same Model, Different Weakness: How Language and Modality Reshape the Jailbreak Attack Surface in Frontier MLLMs

多模态大语言模型越狱漏洞因语言和模态而异

一项新研究表明,前沿多模态大语言模型(MLLMs)对越狱攻击的脆弱性显著受到语言和模态的影响。研究人员发现,与英语相比,西班牙语中的语言框架攻击效果较差,而视觉上明确的多模态攻击则更有效。这表明对齐失败通过不同的特定于语言和模态的机制运作,导致不同语言的安全排名不同。研究结果强调,安全评估框架需要考虑这些跨语言和跨模态的差异。 AI

影响 证明了当前的安全性评估可能无法跨语言推广,需要重新设计的框架来支持全球多模态大语言模型的部署。

排序理由 该集群包含一篇详细介绍大语言模型安全新研究的学术论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。 我们如何撰写摘要 →

报道来源 [5]

  1. arXiv cs.AI TIER_1 English(EN) · Seokil Ham, Jaehyuk Jang, Wonjun Lee, Changick Kim ·

    Jailbreak to Protect: Buffering and Reinforcing via Temporary Jailbreaking for Safe Fine-Tuning in Large Language Models

    arXiv:2605.24550v1 Announce Type: new Abstract: Fine-tuning-as-a-Service (FaaS) enables personalization of large language models (LLMs), but it can weaken safety-alignment under harmful fine-tuning attacks. Recent work has shown that activating harmful-behavior modules during fin…

  2. arXiv cs.AI TIER_1 English(EN) · Xiaodong Wu, Xiangman Li, Qi Li, Lingshuang Liu, Jianbing Ni ·

    SoK: A Comprehensive Security Analysis of Jailbreak Resilience in GPT and DeepSeek Models

    arXiv:2506.18543v2 Announce Type: replace-cross Abstract: The rapid proliferation of Large Language Models (LLMs) has heightened concerns regarding their exposure to jailbreak attacks, which craft adversarial inputs designed to elicit unsafe content. Although proprietary models s…

  3. arXiv cs.AI TIER_1 English(EN) · Mengqi He, Xinyu Tian, Xin Shen, Shu Zou, Jinhong Ni, Zhaoyuan Yang, Weikang Li, Xuesong Li, Jing Zhang ·

    Break the Brake, Not the Wheel: Untargeted Jailbreak via Entropy Maximization

    arXiv:2605.10764v2 Announce Type: replace-cross Abstract: Recent studies show that gradient-based universal image jailbreaks on vision-language models (VLMs) exhibit little or no cross-model transferability, casting doubt on the feasibility of transferable multimodal jailbreaks. …

  4. arXiv cs.CL TIER_1 English(EN) · Casey Ford, Madison Van Doren, Sicheng Jin, Emily Dix ·

    Same Model, Different Weakness: How Language and Modality Reshape the Jailbreak Attack Surface in Frontier MLLMs

    arXiv:2605.23157v1 Announce Type: new Abstract: The attack surface of a multimodal large language model (MLLM) is language-dependent in ways that reveal the mechanistic structure of alignment failures. We present the first systematic cross-lingual, multimodal red-teaming study co…

  5. arXiv cs.CL TIER_1 English(EN) · Emily Dix ·

    Same Model, Different Weakness: How Language and Modality Reshape the Jailbreak Attack Surface in Frontier MLLMs

    The attack surface of a multimodal large language model (MLLM) is language-dependent in ways that reveal the mechanistic structure of alignment failures. We present the first systematic cross-lingual, multimodal red-teaming study comparing jailbreak vulnerability in US English (e…