PulseAugur
LIVE 14:52:42
research · [2 sources] ·
0
research

New benchmark and reasoning method improve AI understanding of sports videos

Researchers have introduced SportsTime, a new benchmark dataset designed for evaluating multimodal large language models (MLLMs) on understanding long-form sports videos. The dataset includes over 14,000 question-answer pairs and 50,000 temporal evidence annotations to address the challenges of locating and integrating sparse evidence. To tackle these issues, they also propose Chain-of-Time Reasoning (CoTR), a method that enhances temporal compositional reasoning by grounding evidence composition and using an iterative evidence-seeking loop during inference. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Advances multimodal reasoning for complex video analysis, potentially improving applications in sports analytics and content summarization.

RANK_REASON Academic paper introducing a new benchmark dataset and reasoning method for video understanding.

Read on arXiv cs.CV →

COVERAGE [2]

  1. arXiv cs.CV TIER_1 · Siyu Cao, Lu Zhang, Ruizhe Zeng, Zhi-yong Liu ·

    Towards Temporal Compositional Reasoning in Long-Form Sports Videos

    arXiv:2604.22226v1 Announce Type: new Abstract: Sports videos are a challenging domain for multimodal understanding because they involve complex and dynamic human activities. Despite rapid progress in Multimodal Large Language Models (MLLMs), long-horizon reasoning in sports vide…

  2. arXiv cs.CV TIER_1 · Zhi-yong Liu ·

    Towards Temporal Compositional Reasoning in Long-Form Sports Videos

    Sports videos are a challenging domain for multimodal understanding because they involve complex and dynamic human activities. Despite rapid progress in Multimodal Large Language Models (MLLMs), long-horizon reasoning in sports videos remains difficult, as answering questions req…