Researchers have developed ImmersedPrivacy, an interactive audio-visual framework using a Unity simulator to evaluate the privacy awareness of Vision-Language Models (VLMs) in physical environments. Their study tested 12 state-of-the-art models, revealing significant performance deficits in identifying sensitive items in complex scenes and adapting to shifting social contexts. Even the best-performing model, Gemini 1.5 Pro, struggled to balance task completion with privacy preservation when faced with conflicting commands. AI
IMPACT Highlights critical privacy gaps in current VLMs for embodied AI, suggesting a need for improved privacy-preserving capabilities in real-world applications.
RANK_REASON Academic paper presenting a new evaluation framework and empirical study of VLMs. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →