Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 6h

Revisiting Active Speaker Detection: An In-the-Wild Benchmark for Generalization and Robustness

Researchers have introduced UniTalk, a new dataset designed to improve active speaker detection (ASD) models by focusing on real-world conditions. Unlike previous benchmarks that primarily used old movies, UniTalk includes diverse video types with underrepresented languages, noisy backgrounds, and crowded scenes. Evaluations show that current state-of-the-art models perform poorly on UniTalk, indicating that ASD is still an unsolved problem in realistic settings. Models trained on UniTalk, however, demonstrate better generalization to other contemporary datasets. AI

IMPACT This new dataset could drive significant improvements in the robustness and generalization of active speaker detection models for real-world applications.

Hugging Face
arXiv
AVA
UniTalk
Le Thien Phuc Nguyen