English(EN) VisInject: Disruption != Injection -- A Dual-Dimension Evaluation of Universal Adversarial Attacks on Vision-Language Models

新研究表明，视觉语言模型的通用对抗性攻击效果不如预期

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-06 04:00

研究人员开发了一种新的评估方法VisInject，用于区分视觉语言模型对抗性攻击中的一般性扰乱和精确注入。他们的研究结果表明，虽然许多攻击可以扰乱模型输出，但精确注入特定概念的成功率远低于此前的报道。该研究利用DeepSeek-V4-Pro和Claude Opus 4.7进行评估，并发布了一个包含对抗性图像和模型响应的数据集，以促进进一步的研究。 AI

影响为对抗性攻击引入了更细致的评估方法，可能有助于开发更鲁棒的视觉语言模型。

排序理由这是一篇详细介绍视觉语言模型对抗性攻击新评估方法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Pang Liu, Yingjie Lao · 2026-05-06 04:00

VisInject：对抗性攻击 != 注入攻击 -- 对视觉语言模型的通用对抗性攻击的双维度评估

arXiv:2605.01449v1 Announce Type: cross Abstract: Universal adversarial attacks on aligned multimodal large language models are increasingly reported with attack success rates in the 60-80% range, suggesting the visual modality is highly vulnerable to imperceptible perturbations …

报道来源 [1]

VisInject：对抗性攻击 != 注入攻击 -- 对视觉语言模型的通用对抗性攻击的双维度评估

相关实体

相关话题