Are Neuro-Inspired Multi-Modal Vision-Language Models Resilient to Membership Inference Privacy Leakage?
New research indicates that multi-modal vision-language models (VLMs) are susceptible to privacy attacks, specifically membership inference attacks (MIAs), which can leak sensitive training data. One study proposes a neuroscience-inspired topological regularization framework that significantly reduces MIA success rates in models like BLIP, PaliGemma 2, and ViT-GPT2 without substantially compromising their utility. Another paper highlights that encoder-free VLMs, such as Gemma4 and Fuyu, present a unique privacy risk because their architecture allows intermediate visual tokens to act as side channels, enabling the recovery of recognizable image structures and even access codes, a vulnerability not present in encoder-based models. AI
IMPACT These findings highlight critical privacy vulnerabilities in multi-modal AI, potentially impacting the deployment and trust in these systems.