New benchmark highlights safety flaws in video generation models

By PulseAugur Editorial · [3 sources] · 2026-06-01 16:12

Researchers have developed SafeGen-Bench, a new benchmark designed to evaluate the safety of image-conditioned text-to-video generation models. The benchmark addresses the challenge of harmful content being generated even from safe text and image inputs, defining 10 malicious categories focused on temporal sequences and depicted behaviors. Initial evaluations show current models struggle with safety, achieving unsafety scores up to 44.5, and that unimodal guardrails are insufficient, failing 80% of the time across seven malicious categories. AI

IMPACT Highlights critical safety vulnerabilities in current text-to-video models, necessitating improved guardrails and evaluation methods for responsible AI development.

RANK_REASON The cluster contains two academic papers detailing new research and benchmarks in AI, specifically related to video generation.

Read on arXiv cs.CV →

paper
safety

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

New benchmark highlights safety flaws in video generation models

COVERAGE [3]

arXiv cs.CV TIER_1 English(EN) · Yingzi Ma, Xiaogeng Liu, Yawen Zheng, Chaowei Xiao · 2026-06-02 04:00

SafeGen-Bench: Benchmarking Safety in Image-Conditioned Text-to-Video Generation

arXiv:2606.01481v1 Announce Type: new Abstract: With the rapid advancements in text-to-image diffusion models, generative video models (T2V models) like Sora can now produce short synthetic videos from a text prompt or an initial image. However, synthetic video generation -- espe…
arXiv cs.CV TIER_1 English(EN) · Yuheng Chen, Teng Hu, Yuji Wang, Qingdong He, Lizhuang Ma, Jiangning Zhang · 2026-06-02 04:00

Spatial-Temporal Decoupled Reference Conditioning for Identity-Preserving Text-to-Video Generation

arXiv:2606.02441v1 Announce Type: new Abstract: Identity-preserving video generation (IPVG) aims to synthesize high-fidelity videos that follow text prompts while faithfully preserving a reference identity. Despite recent progress, existing IPVG methods still struggle to balance …
arXiv cs.CV TIER_1 English(EN) · Jiangning Zhang · 2026-06-01 16:12

Spatial-Temporal Decoupled Reference Conditioning for Identity-Preserving Text-to-Video Generation

Identity-preserving video generation (IPVG) aims to synthesize high-fidelity videos that follow text prompts while faithfully preserving a reference identity. Despite recent progress, existing IPVG methods still struggle to balance high-level semantic control and low-level identi…

COVERAGE [3]

SafeGen-Bench: Benchmarking Safety in Image-Conditioned Text-to-Video Generation

Spatial-Temporal Decoupled Reference Conditioning for Identity-Preserving Text-to-Video Generation

Spatial-Temporal Decoupled Reference Conditioning for Identity-Preserving Text-to-Video Generation

RELATED ENTITIES

RELATED TOPICS