On the Adversarial Robustness of Multimodal LLM Judges
Researchers have introduced RobustMLLMJudge, a framework designed to assess the adversarial robustness of Multimodal Large Language Models (MLLMs) when they are used as judges for tasks like image quality and safety assessment. The study found that current MLLM judges are susceptible to attacks that inflate scores, and proposed a new method called Manifold-Guided Semantic Induction Attack (MGSIA) to create more effective and transferable adversarial attacks. This highlights a critical need for developing more robust MLLM judges to ensure the reliability of automated evaluation systems. AI
IMPACT Highlights the need for more robust AI judges, potentially impacting the development and deployment of AI evaluation systems.