PulseAugur
实时 15:55:48

New benchmark and reasoning method improve AI understanding of sports videos

研究人员推出了SportsTime,这是一个新的基准数据集,旨在评估多模态大语言模型(MLLMs)对长篇体育视频的理解能力。该数据集包含超过14,000个问答对和50,000个时间证据标注,以应对定位和整合稀疏证据的挑战。为了解决这些问题,他们还提出了Chain-of-Time Reasoning (CoTR)方法,该方法通过基础证据组合和在推理过程中使用迭代式证据搜索循环来增强时间组合推理能力。 AI

影响 推动了复杂视频分析的多模态推理能力,可能改进体育分析和内容摘要等应用。

排序理由 学术论文,介绍用于视频理解的新基准数据集和推理方法。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

New benchmark and reasoning method improve AI understanding of sports videos

报道来源 [2]

  1. arXiv cs.CV TIER_1 English(EN) · Siyu Cao, Lu Zhang, Ruizhe Zeng, Zhi-yong Liu ·

    Towards Temporal Compositional Reasoning in Long-Form Sports Videos

    arXiv:2604.22226v1 Announce Type: new Abstract: Sports videos are a challenging domain for multimodal understanding because they involve complex and dynamic human activities. Despite rapid progress in Multimodal Large Language Models (MLLMs), long-horizon reasoning in sports vide…

  2. arXiv cs.CV TIER_1 English(EN) · Zhi-yong Liu ·

    Towards Temporal Compositional Reasoning in Long-Form Sports Videos

    Sports videos are a challenging domain for multimodal understanding because they involve complex and dynamic human activities. Despite rapid progress in Multimodal Large Language Models (MLLMs), long-horizon reasoning in sports videos remains difficult, as answering questions req…