New Deepfake Benchmark Reveals Steep Performance Drop for AI Detectors

By PulseAugur Editorial · [1 sources] · 2026-05-28 04:00

Researchers have introduced Deepfake-Eval-2024, a new benchmark designed to evaluate deepfake detection models against real-world, in-the-wild content. The benchmark comprises 45 hours of video, 56.5 hours of audio, and 1,975 images collected in 2024 from social media and user submissions, reflecting the latest manipulation techniques. Evaluations showed a significant performance drop for open-source models on this new dataset, with accuracy decreasing by up to 50% compared to older academic benchmarks. While commercial and fine-tuned models performed better, they still did not match the accuracy of human forensic analysts. AI

IMPACT Highlights the critical need for updated deepfake detection benchmarks as AI generation capabilities advance, impacting trust and security in digital media.

RANK_REASON The cluster describes a new academic benchmark for evaluating AI-generated content, which falls under research. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

paper
safety

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New Deepfake Benchmark Reveals Steep Performance Drop for AI Detectors

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Nuria Alina Chandra, Hannah Lee, Ryan Murtfeldt, Lin Qiu, Arnab Karmakar, Emmanuel Tanumihardja, Kevin Farhat, Ben Caffee, Changyeon Lee, Jongwook Choi, Sejin Paik, Aerin Kim, Oren Etzioni · 2026-05-28 04:00

Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark of Deepfakes Circulated in 2024

arXiv:2503.02857v5 Announce Type: replace-cross Abstract: In the age of increasingly realistic generative AI, robust deepfake detection is essential for mitigating fraud and disinformation. While many deepfake detectors report high accuracy on academic datasets, we show that thes…

COVERAGE [1]

Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark of Deepfakes Circulated in 2024

RELATED TOPICS