PulseAugur
实时 10:58:07
English(EN) OmniPro: A Comprehensive Benchmark for Omni-Proactive Streaming Video Understanding

新基准评估全模态大语言模型在长视频理解方面的能力

研究人员推出了 OmniProVideoOdyssey 两个新基准,旨在评估全模态大语言模型理解长而复杂视频内容的能力。OmniPro 专注于主动式流媒体视频理解,评估模型从视听流中决定何时以及说什么的能力,包含跨越不同任务的 2,700 个经人工验证的样本。VideoOdyssey 针对超长上下文视频理解,包含极长的视频(平均 109 分钟),并评估在长时间内的连续推理和记忆保持能力。这两个基准都突显了当前模型在长视域鲁棒性、音频利用以及细粒度感知方面的局限性,尤其是在处理非语音音频时。 AI

影响 这些基准将推动能够理解复杂、长格式视频内容的人工智能模型的发展,这对于监控、内容分析和自主系统等应用至关重要。

排序理由 两篇新研究论文介绍了用于评估人工智能模型的基准。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。 我们如何撰写摘要 →

报道来源 [5]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    OmniPro:全方位主动式流媒体视频理解的综合基准

    OmniPro is introduced as the first benchmark for evaluating omni-modal large language models' proactive streaming video understanding, featuring diverse tasks and dual-mode evaluation protocols.

  2. arXiv cs.CV TIER_1 English(EN) · Peiran Wu, Yunze Liu, Chi-Hao Wu, Chen Chen, Junxiao Shen ·

    O-MARC: Omni Memory-Augmented Compression Distillation for Efficient Video Understanding

    arXiv:2605.26584v1 Announce Type: new Abstract: Omnimodal large language models enable unified audio video understanding, but long joint token sequences make inference costly, and existing benchmarks do not fully isolate audio visual association in noisy user generated videos. We…

  3. arXiv cs.CV TIER_1 English(EN) · Ming Xie, Zizheng Huang, Xudong Tan, Chao Wang, Xiangyu Zeng, Wenxiao Wu, Tao Chen, Limin Wang, Yanwei Fu ·

    StreamOV: Streaming Omni-Video Understanding via Evidence-Guided Memory and Response Triggering

    arXiv:2605.25621v1 Announce Type: new Abstract: While streaming omni-video understanding demands continuous perception and proactive, real-time interaction, this crucial area remains largely under-explored. Current omni-modal methods are inherently designed for offline settings, …

  4. arXiv cs.CV TIER_1 English(EN) · Yanwei Fu ·

    StreamOV: Streaming Omni-Video Understanding via Evidence-Guided Memory and Response Triggering

    While streaming omni-video understanding demands continuous perception and proactive, real-time interaction, this crucial area remains largely under-explored. Current omni-modal methods are inherently designed for offline settings, limiting their applicability in streaming scenar…

  5. arXiv cs.CV TIER_1 English(EN) · Haichen He, Jiayi Zhou, Sifeng Shang, Yihan Hu, Yuanhan Zhang, Kaiyang Zhou ·

    VideoOdyssey:超长上下文和全模态视频理解基准

    arXiv:2605.22907v1 Announce Type: new Abstract: Real-world long video understanding requires models to perform continuous tracking, information integration and memory retention over massive temporal spans within extreme video durations. Mastering this intense cognitive load const…