PulseAugur
实时 02:14:37
English(EN) Auditing Frontier Vision-Language Models for Trustworthy Medical VQA: Grounding Failures, Format Collapse, and Domain Adaptation

前沿VLM因定位不佳和混淆在医疗VQA测试中失败

一篇新论文评估了五种领先的视觉-语言模型(VLM)在可信医疗视觉问答(VQA)方面的表现。研究发现,这些模型在准确识别解剖目标方面的能力存在显著局限性,并且存在左右混淆的倾向,表现最好的模型平均IoU仅为0.23。将定位整合到流程中会进一步降低性能,凸显了定位是关键瓶颈。虽然领域适应在提高VQA准确性方面显示出希望,但感知和可信度问题仍然存在。 AI

影响 识别出前沿VLM在医疗应用中关键的感知和定位失败,表明需要领域适应来提高可信度。

排序理由 学术论文评估前沿模型在特定任务上的表现。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

前沿VLM因定位不佳和混淆在医疗VQA测试中失败

报道来源 [3]

  1. arXiv cs.AI TIER_1 English(EN) · Xupeng Chen, Binbin Shi, Chenqian Le, Qifu Yin, Lang Lin, Haowei Ni, Ran Gong, Panfeng Li ·

    Auditing Frontier Vision-Language Models for Trustworthy Medical VQA: Grounding Failures, Format Collapse, and Domain Adaptation

    arXiv:2604.27720v1 Announce Type: new Abstract: Deploying vision-language models (VLMs) in clinical settings demands auditable behavior under realistic failure conditions, yet the failure landscape of frontier VLMs on specialized medical inputs is poorly characterized. We audit f…

  2. arXiv cs.AI TIER_1 English(EN) · Panfeng Li ·

    Auditing Frontier Vision-Language Models for Trustworthy Medical VQA: Grounding Failures, Format Collapse, and Domain Adaptation

    Deploying vision-language models (VLMs) in clinical settings demands auditable behavior under realistic failure conditions, yet the failure landscape of frontier VLMs on specialized medical inputs is poorly characterized. We audit five recent frontier and grounding-aware VLMs (Ge…

  3. Hugging Face Daily Papers TIER_1 English(EN) ·

    Auditing Frontier Vision-Language Models for Trustworthy Medical VQA: Grounding Failures, Format Collapse, and Domain Adaptation

    Deploying vision-language models (VLMs) in clinical settings demands auditable behavior under realistic failure conditions, yet the failure landscape of frontier VLMs on specialized medical inputs is poorly characterized. We audit five recent frontier and grounding-aware VLMs (Ge…