Forced Deferral: Manipulating Routing Decisions in Multimodal LLM Cascades
Researchers have identified a new vulnerability in multimodal large language model (MLLM) cascades, termed the Forced Deferral Attack (FDA). This attack manipulates the weak model's confidence scores, causing the cascade to consistently route queries to the more computationally expensive strong model. The FDA utilizes a universal border trigger to achieve this, outperforming existing adversarial image and prompt injection methods. The findings highlight a new attack surface in MLLM cascades that can lead to unintended increases in compute usage without directly impacting answer accuracy. AI
IMPACT Highlights a new vulnerability in multimodal LLM architectures that could increase operational costs and requires new security considerations.