Researchers have introduced SportsTime, a new benchmark dataset designed for evaluating multimodal large language models (MLLMs) on understanding long-form sports videos. The dataset includes over 14,000 question-answer pairs and 50,000 temporal evidence annotations to address the challenges of locating and integrating sparse evidence. To tackle these issues, they also propose Chain-of-Time Reasoning (CoTR), a method that enhances temporal compositional reasoning by grounding evidence composition and using an iterative evidence-seeking loop during inference. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Advances multimodal reasoning for complex video analysis, potentially improving applications in sports analytics and content summarization.
RANK_REASON Academic paper introducing a new benchmark dataset and reasoning method for video understanding.