English(EN) SurgAtlas: A Large-Scale Surgical Video-Language Dataset with 2,391 Hours of Open and Minimally Invasive Surgery

新数据集增强AI对手术视频的理解能力

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-24 14:53

研究人员开发了SurgSTU-Pipeline来创建一个新的数据集SurgSTU，用于手术视频的细粒度时空理解。该管道解决了现有数据集的局限性以及手动标注或LLM生成数据的挑战。SurgSTU数据集包含超过6700个视频片段和150,000个问答对，表明虽然通用视觉-语言模型最初表现不佳，但可以通过上下文学习和在该专业数据集上进行微调来改进。 AI

影响这个专业数据集和管道可以显著提高AI系统分析手术过程的准确性和能力，可能带来更好的计算机辅助手术工具。

排序理由该集群包含一篇详细介绍AI研究新方法和数据集的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-24 14:53

SurgAtlas: A Large-Scale Surgical Video-Language Dataset with 2,391 Hours of Open and Minimally Invasive Surgery

We introduce SurgAtlas, the largest surgical video-language dataset to date, comprising 15,291 videos (2,391 hours) spanning 18 surgical specialties and over 5,000 procedure types, sourced entirely from publicly available YouTube content. SurgAtlas is also the first surgical vide…
arXiv cs.CV TIER_1 English(EN) · Lennart Maack, Alexander Schlaefer · 2026-06-29 04:00

An Approach to Enriching Surgical Video Datasets for Fine-Grained Spatial-Temporal Understanding of Vision-Language Models

arXiv:2604.00784v2 Announce Type: replace Abstract: Surgical video understanding is a crucial prerequisite for advancing Computer-Assisted Surgery. While vision-language models (VLMs) have recently been applied to the surgical domain, existing surgical vision-language datasets la…

报道来源 [2]

SurgAtlas: A Large-Scale Surgical Video-Language Dataset with 2,391 Hours of Open and Minimally Invasive Surgery

An Approach to Enriching Surgical Video Datasets for Fine-Grained Spatial-Temporal Understanding of Vision-Language Models

相关实体

相关话题