Researchers unveil new stealthy backdoor attacks on AI models using diffusion and style features

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-06 04:00

Researchers have developed new methods for backdoor attacks on advanced AI models, specifically targeting Vision-Language Models (VLMs) and Diffusion Models (DMs). One approach, CBV, uses diffusion models to create natural-looking poisoned examples for VLMs by subtly altering image generation processes and focusing modifications on semantically important regions. Another method, Gungnir, exploits stylistic features within images as stealthy triggers for diffusion models, making attacks harder to detect and bypass existing defenses. AI

影响 New attack vectors highlight vulnerabilities in VLMs and diffusion models, necessitating advancements in AI safety and defense mechanisms.

排序理由 Two research papers detailing novel backdoor attack methods on AI models.

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Ji Guo, Xiaolong Qin, Cencen Liu, Jielei Wang, Jierun Chen, Wenbo Jiang · 2026-05-06 04:00

CBV: Clean-label Backdoor Attacks on Vision Language Models via Diffusion Models

arXiv:2605.02202v1 Announce Type: new Abstract: Vision-Language Models (VLMs) have achieved remarkable success in tasks such as image captioning and visual question answering (VQA). However, as their applications become increasingly widespread, recent studies have revealed that V…
arXiv cs.CV TIER_1 English(EN) · Lei Zhang, Yu Pan, Bingrong Dai, Lin Wang · 2026-05-08 04:00

Gungnir: Exploiting Stylistic Features in Images for Backdoor Attacks on Diffusion Models

arXiv:2502.20650v5 Announce Type: replace Abstract: Diffusion Models (DMs) have achieved remarkable success in image generation, yet recent studies reveal their vulnerability to backdoor attacks, where adversaries manipulate outputs via covert triggers embedded in inputs. Existin…

报道来源 [2]

CBV: Clean-label Backdoor Attacks on Vision Language Models via Diffusion Models

Gungnir: Exploiting Stylistic Features in Images for Backdoor Attacks on Diffusion Models

相关实体

相关话题