Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 8h

Erased but Not Forgotten: How Backdoors Compromise Concept Erasure

Researchers have identified a significant vulnerability in concept erasure techniques designed for text-to-image diffusion models, termed the Erasure Evasion Backdoor (EEB). This backdoor allows adversaries to embed a hidden trigger linked to a concept slated for removal, ensuring that harmful content associated with that concept can still be generated even after erasure attempts. The EEB was shown to be effective across multiple state-of-the-art erasure methods, leading to substantial success rates in generating unwanted outputs, including celebrity likenesses and explicit imagery. AI

IMPACT Highlights a critical flaw in AI safety mechanisms, necessitating new methods to ensure genuine concept removal and prevent misuse.

text-to-image diffusion models
Tobias Braun
Erasure Evasion Backdoor