New framework reveals vulnerability of AI judges to adversarial attacks

By PulseAugur Editorial · [1 sources] · 2026-06-16 04:00

Researchers have introduced RobustMLLMJudge, a framework designed to assess the adversarial robustness of Multimodal Large Language Models (MLLMs) when they are used as judges for tasks like image quality and safety assessment. The study found that current MLLM judges are susceptible to attacks that inflate scores, and proposed a new method called Manifold-Guided Semantic Induction Attack (MGSIA) to create more effective and transferable adversarial attacks. This highlights a critical need for developing more robust MLLM judges to ensure the reliability of automated evaluation systems. AI

IMPACT Highlights the need for more robust AI judges, potentially impacting the development and deployment of AI evaluation systems.

RANK_REASON The cluster contains an academic paper detailing a new framework and attack method for evaluating AI model robustness. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Zihan Wang, Guansong Pang, Zelin Liu, Wenjun Miao, Jin Zheng, Xiao Bai · 2026-06-16 04:00

On the Adversarial Robustness of Multimodal LLM Judges

arXiv:2606.15608v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) are increasingly used as automated judges, e.g., for image quality and safety assessment. However, their adversarial robustness remains largely unexplored, threatening the fairness and reliab…

COVERAGE [1]

On the Adversarial Robustness of Multimodal LLM Judges

RELATED ENTITIES

RELATED TOPICS