新框架统一了机器人手术中的分割和VQA

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-16 04:00

研究人员开发了一个新颖的框架，该框架统一了机器人手术中的像素级分割和视觉问答（VQA）。该方法使用由视觉语言模型（VLM）生成的对象令牌来指导答案预测，并通过基于SAM的解码器生成分割掩码。通过同时优化分割和VQA目标的对象令牌，该模型学习了空间基础表示，增强了推理能力并提供了显式的像素级基础。该方法在RAMIE和EndoVis18数据集上表现出卓越的性能，提高了手术场景的细粒度理解。 AI

影响增强了机器人手术应用中手术场景的细粒度理解和推理能力。

排序理由该集群包含一篇详细介绍计算机视觉领域新技术方法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Yiping Li, Ronald de Jong, Romy van Jaarsveld, Franco Badaloni, Gino Kuiper, Jelle Ruurda, Josien Pluim, Marcel Breeuwer · 2026-06-16 04:00

Object Tokens as a Bridge Between Segmentation and Visual Question Answering in Robotic Surgery

arXiv:2606.15861v1 Announce Type: new Abstract: Visual Question Answering (VQA) in robotic surgery, referred to as surgical VQA, requires high-level understanding of complex surgical scenes and the integration of visual perception with language reasoning, with the potential to su…

报道来源 [1]

Object Tokens as a Bridge Between Segmentation and Visual Question Answering in Robotic Surgery

相关实体

相关话题