The Third Perception Test challenge, held alongside ICCV 2025, aimed to benchmark video models and assess progress in multimodal perception. This year's challenge emphasized task unification, presenting five consolidated tracks including unified video QA, object tracking, and action localization. A novel subset reformulated perception tasks into multiple-choice video QA questions, highlighting current models' difficulties in handling diverse tasks through unified interfaces. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Highlights challenges in current multimodal models for unified perception tasks, potentially guiding future research directions.
RANK_REASON This is a summary of an academic challenge and paper presented at a conference.