PulseAugur
实时 07:08:42

New MedHorizon benchmark tests AI's ability to understand long medical videos

Researchers have introduced MedHorizon, a new benchmark designed to test multimodal large language models (MLLMs) on understanding long-form medical videos. This benchmark includes 759 hours of clinical procedures and 1,253 questions, focusing on the challenge of identifying sparse, crucial evidence within lengthy and often redundant visual data. Current models struggle significantly, with the best achieving only 41.1% accuracy, highlighting major bottlenecks in evidence retrieval and clinical reasoning over complete workflows. AI

影响 Establishes a new, challenging benchmark for medical video understanding, pushing the development of MLLMs for complex clinical reasoning.

排序理由 The cluster describes a new academic paper introducing a benchmark for AI model evaluation.

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

New MedHorizon benchmark tests AI's ability to understand long medical videos

报道来源 [2]

  1. arXiv cs.CV TIER_1 English(EN) · Bodong Du, Bowen Liu, Yang Yu, Xinpeng Ding, Zhiheng Wu, Shuning Wang, Shuo Nie, Naiming Liu, Qifeng Chen, Yangqiu Song, Xiaomeng Li ·

    MedHorizon: Towards Long-context Medical Video Understanding in the Wild

    arXiv:2605.06537v1 Announce Type: new Abstract: Medical multimodal large language models (MLLMs) have advanced image understanding and short-video analysis, but real clinical review often requires full-procedure video understanding. Unlike general long videos, medical procedures …

  2. arXiv cs.CV TIER_1 English(EN) · Xiaomeng Li ·

    MedHorizon: Towards Long-context Medical Video Understanding in the Wild

    Medical multimodal large language models (MLLMs) have advanced image understanding and short-video analysis, but real clinical review often requires full-procedure video understanding. Unlike general long videos, medical procedures contain highly redundant anatomical views, while…