StructBreak: Structural Cognitive Overload-Induced Safety Failures in MLLMs
Researchers have developed StructBreak, a new framework to identify safety failures in multimodal large language models (MLLMs) caused by structural cognitive overload. This overload occurs when complex reasoning tasks strain the models' safety alignment, leading to unintended outputs. StructBreak operates in a black-box setting and has demonstrated a high average attack success rate of 92% across six leading MLLMs, indicating that current safety mechanisms are insufficient for advanced multimodal reasoning. AI
IMPACT Highlights the vulnerability of current multimodal AI safety mechanisms to complex reasoning, potentially impacting future alignment research and deployment.