Perception Test 2025 challenge unifies video QA, tracking, and action localization tasks

By PulseAugur Editorial · [1 sources] · 2026-04-30 04:00

The Third Perception Test challenge, held alongside ICCV 2025, aimed to benchmark video models and assess progress in multimodal perception. This year's challenge emphasized task unification, presenting five consolidated tracks including unified video QA, object tracking, and action localization. A novel subset reformulated perception tasks into multiple-choice video QA questions, highlighting current models' difficulties in handling diverse tasks through unified interfaces. AI

IMPACT Highlights challenges in current multimodal models for unified perception tasks, potentially guiding future research directions.

RANK_REASON This is a summary of an academic challenge and paper presented at a conference.

Read on arXiv cs.CV →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Perception Test 2025 challenge unifies video QA, tracking, and action localization tasks

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Joseph Heyward, Nikhil Parthasarathy, Tyler Zhu, Aravindh Mahendran, Jo\~ao Carreira, Dima Damen, Andrew Zisserman, Viorica P\u{a}tr\u{a}ucean · 2026-04-30 04:00

Perception Test 2025: Challenge Summary and a Unified VQA Extension

arXiv:2601.06287v2 Announce Type: replace Abstract: The Third Perception Test challenge was organised as a full-day workshop alongside the IEEE/CVF International Conference on Computer Vision (ICCV) 2025. Its primary goal is to benchmark state-of-the-art video models and measure …

COVERAGE [1]

Perception Test 2025: Challenge Summary and a Unified VQA Extension

RELATED TOPICS