English(EN) One Perturbation, Two Failure Modes: Probing VLM Safety via Embedding-Guided Typographic Perturbations

研究人员通过嵌入引导的字体攻击探测 VLM 安全性

作者 PulseAugur 编辑部 · [2 个来源] · 2026-04-28 01:21

研究人员开发了一种通过字体提示注入来探测视觉语言模型 (VLM) 安全漏洞的方法。他们的研究发现，多模态嵌入距离可以很好地预测攻击成功率，为漏洞提供了一种模型无关的代理。该技术被用作红队测试工具，用于测试感知可读性和安全对齐等因素，而无需直接访问目标模型。 AI

影响新的红队测试技术可以加速发现多模态模型的安全缺陷。

排序理由学术论文，详细介绍了一种探测 VLM 安全漏洞的新方法。

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CV TIER_1 English(EN) · Ravikumar Balakrishnan, Sanket Mendapara · 2026-04-29 04:00

一次扰动，两种失效模式：通过嵌入式字体扰动探测 VLM 安全性

arXiv:2604.25102v1 Announce Type: new Abstract: Typographic prompt injection exploits vision language models' (VLMs) ability to read text rendered in images, posing a growing threat as VLMs power autonomous agents. Prior work typically focus on maximizing attack success rate (ASR…
arXiv cs.CV TIER_1 English(EN) · Sanket Mendapara · 2026-04-28 01:21

一次扰动，两种失效模式：通过嵌入式字体扰动探测 VLM 安全性

Typographic prompt injection exploits vision language models' (VLMs) ability to read text rendered in images, posing a growing threat as VLMs power autonomous agents. Prior work typically focus on maximizing attack success rate (ASR) but does not explain \emph{why} certain render…