PulseAugur
EN
LIVE 21:34:04

New research probes VLM susceptibility to visual persuasion and influence

Researchers are developing new frameworks to evaluate the susceptibility of Vision-Language Models (VLMs) to multimodal persuasion and visual influences. One study introduces MMPersuade to test agent-to-agent persuasion using images and psychological strategies, finding that multimodal inputs are more effective than text alone and that susceptibility varies by domain and model architecture. Another paper proposes a method to systematically perturb images and analyze how VLMs' visual preferences shift, aiming to uncover vulnerabilities and improve auditing. A third study focuses on Vision-Language-Action (VLA) models in autonomous driving, using a perturbation framework to understand how visual information grounds driving behavior and to develop safer systems. AI

IMPACT These studies highlight critical vulnerabilities in multimodal AI systems, informing the development of more robust and trustworthy AI agents across various applications.

RANK_REASON Multiple arXiv papers introducing new frameworks and analyses for evaluating VLM and VLA model behavior.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 4 sources. How we write summaries →

COVERAGE [4]

  1. arXiv cs.CL TIER_1 English(EN) · Haoyi Qiu, Yilun Zhou, Pranav Narayanan Venkit, Kung-Hsiang Huang, Jiaxin Zhang, Nanyun Peng, Chien-Sheng Wu ·

    Seeing is Believing? Evaluating Vision-Language Model Susceptibility in Agent-to-Agent Multimodal Persuasion

    arXiv:2510.22768v2 Announce Type: replace Abstract: As autonomous agents increasingly interact, they inevitably attempt to influence one another. While prior work in text-only settings has explored the dynamics of Agent-to-Agent (A2A) persuasion, the rise of Vision-Language Model…

  2. arXiv cs.AI TIER_1 English(EN) · Manuel Cherep, Pranav M R, Pattie Maes, Nikhil Singh ·

    Visual Persuasion: What Influences Decisions of Vision-Language Models?

    arXiv:2602.15278v2 Announce Type: replace-cross Abstract: The web is littered with images, once created for human consumption and now increasingly interpreted by agents using vision-language models (VLMs). These agents make visual decisions at scale, deciding what to click, recom…

  3. arXiv cs.AI TIER_1 English(EN) · Jingtao He, Hongliang Lu, Xiaoyun Qiu, Yixuan Wang, Xinhu Zheng ·

    Does Visual Information Play a Decisive Role in Vision-Language-Action Model Driving Behavior?

    arXiv:2605.31041v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models have demonstrated promising capability in autonomous driving, highlighting the potential of unified multimodal architectures for jointly modeling perception and planning. However, how current VL…

  4. arXiv cs.CV TIER_1 English(EN) · Xinhu Zheng ·

    Does Visual Information Play a Decisive Role in Vision-Language-Action Model Driving Behavior?

    Vision-Language-Action (VLA) models have demonstrated promising capability in autonomous driving, highlighting the potential of unified multimodal architectures for jointly modeling perception and planning. However, how current VLA-based driving behavior is grounded in visual inf…