New benchmark and reasoning method improve AI understanding of sports videos

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 2 sources

Researchers have introduced SportsTime, a new benchmark dataset designed for evaluating multimodal large language models (MLLMs) on understanding long-form sports videos. The dataset includes over 14,000 question-answer pairs and 50,000 temporal evidence annotations to address the challenges of locating and integrating sparse evidence. To tackle these issues, they also propose Chain-of-Time Reasoning (CoTR), a method that enhances temporal compositional reasoning by grounding evidence composition and using an iterative evidence-seeking loop during inference. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Advances multimodal reasoning for complex video analysis, potentially improving applications in sports analytics and content summarization.

RANK_REASON Academic paper introducing a new benchmark dataset and reasoning method for video understanding.

Read on arXiv cs.CV →

paper
other

COVERAGE [2]

arXiv cs.CV TIER_1 · Siyu Cao, Lu Zhang, Ruizhe Zeng, Zhi-yong Liu · 2026-04-27 04:00

Towards Temporal Compositional Reasoning in Long-Form Sports Videos

arXiv:2604.22226v1 Announce Type: new Abstract: Sports videos are a challenging domain for multimodal understanding because they involve complex and dynamic human activities. Despite rapid progress in Multimodal Large Language Models (MLLMs), long-horizon reasoning in sports vide…
arXiv cs.CV TIER_1 · Zhi-yong Liu · 2026-04-24 05:02

Towards Temporal Compositional Reasoning in Long-Form Sports Videos

Sports videos are a challenging domain for multimodal understanding because they involve complex and dynamic human activities. Despite rapid progress in Multimodal Large Language Models (MLLMs), long-horizon reasoning in sports videos remains difficult, as answering questions req…

COVERAGE [2]

Towards Temporal Compositional Reasoning in Long-Form Sports Videos

Towards Temporal Compositional Reasoning in Long-Form Sports Videos

RELATED ENTITIES

RELATED TOPICS