MLingualFC: Evaluating Jailbreak Vulnerabilities in Multilingual Vision-Language Models
Researchers have developed new methods to detect and evaluate jailbreak vulnerabilities in large language models (LLMs) and vision-language models (VLMs) across multiple languages. One approach, MLJailDe, uses back-translation and relative-distance constraints to create a multilingual dataset and improve cross-lingual generalization for LLM jailbreak detection, achieving a 97.1% F1 score on unseen languages. Another study introduced MLingualFC, a benchmark for VLMs that encodes harmful instructions into flowchart images in five languages, revealing significant multilingual safety gaps and demonstrating that visual attacks can bypass safety alignment across languages, though with varying success rates depending on the script. AI
IMPACT Highlights critical safety gaps in multilingual AI models, necessitating improved cross-lingual safety alignment and evaluation.