English(EN) Towards Unified Surgical Scene Understanding:Bridging Reasoning and Grounding via MLLMs

新的多模态大语言模型框架统一手术场景理解

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-13 13:42

研究人员开发了SurgMLLM，一个新颖的框架，通过整合高级推理和低级视觉定位来统一手术场景理解。该多模态大语言模型（MLLM）经过微调，可处理手术视频，使其能够联合建模手术阶段、器械-动词-目标三元组及其精确分割。该系统在CholecT45-Scene数据集上取得了显著的改进，将三元组识别指标AP_IVT从40.7%提升到46.0%，并在阶段识别和分割方面超越了现有方法。 AI

影响通过实现对手术视频更全面的理解，增强了AI在医疗程序中的能力。

排序理由该集群包含一篇详细介绍用于手术场景理解的新框架和模型的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Weixin Si · 2026-05-13 13:42

Towards Unified Surgical Scene Understanding:Bridging Reasoning and Grounding via MLLMs

Surgical scene understanding is a cornerstone of computer-assisted intervention. While recent advances, particularly in surgical image segmentation, have driven progress, real-world clinical applications require a more holistic understanding that jointly captures procedural conte…

报道来源 [1]

Towards Unified Surgical Scene Understanding:Bridging Reasoning and Grounding via MLLMs

相关实体

相关话题