Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 6h · [2 sources]

Are Neuro-Inspired Multi-Modal Vision-Language Models Resilient to Membership Inference Privacy Leakage?

New research indicates that multi-modal vision-language models (VLMs) are susceptible to privacy attacks, specifically membership inference attacks (MIAs), which can leak sensitive training data. One study proposes a neuroscience-inspired topological regularization framework that significantly reduces MIA success rates in models like BLIP, PaliGemma 2, and ViT-GPT2 without substantially compromising their utility. Another paper highlights that encoder-free VLMs, such as Gemma4 and Fuyu, present a unique privacy risk because their architecture allows intermediate visual tokens to act as side channels, enabling the recovery of recognizable image structures and even access codes, a vulnerability not present in encoder-based models. AI

IMPACT These findings highlight critical privacy vulnerabilities in multi-modal AI, potentially impacting the deployment and trust in these systems.