Brief · PulseAugur

RESEARCH · arXiv cs.CL English(EN) · 4d · [5 sources]

Same Model, Different Weakness: How Language and Modality Reshape the Jailbreak Attack Surface in Frontier MLLMs

A new study reveals that the vulnerability of frontier multimodal large language models (MLLMs) to jailbreak attacks is significantly influenced by language and modality. Researchers found that while linguistic framing attacks were less effective in Spanish compared to English, visually explicit multimodal attacks became more potent. This suggests that alignment failures operate through distinct language- and modality-specific mechanisms, leading to different safety rankings across languages. The findings highlight the need for safety evaluation frameworks to account for these cross-lingual and cross-modal differences. AI

IMPACT Demonstrates that current safety evaluations may not generalize across languages, necessitating redesigned frameworks for global MLLM deployment.

GPT-5
Pixtral Large
Claude Sonnet 4.5
US English
Mexican Spanish
Qwen Omni