Pulling The REINS: Training-Free Safety Alignment of Video Diffusion Models via Representation Steering
Researchers have developed a novel training-free method called REINS (REpresentation-space INference-time Safety steering) to align video diffusion models and prevent the generation of unsafe content. This technique works by steering the internal representations of the models at inference time, rather than requiring expensive safety fine-tuning. REINS identifies a specific direction in the model's hidden states that separates safe from unsafe content, and by adding this direction to intermediate layers, it redirects harmful generation towards safe alternatives with minimal computational overhead. The method has been evaluated across multiple video diffusion models and scales, demonstrating its broad applicability in the video generation safety literature. AI
IMPACT Offers a computationally efficient, training-free approach to mitigate harmful content generation in video diffusion models.