English(EN) Jolia: Concept-Level Vision-Language Alignment for 3D CT Contrastive Learning

Jolia 模型通过概念级视觉-语言对齐增强 3D CT 分析

作者 PulseAugur 编辑部 · [3 个来源] · 2026-06-23 13:35

研究人员开发了 Jolia，一种新颖的 3D CT 基础模型，可增强医学影像的视觉-语言对齐。与标准的 CLIP 风格预训练不同，Jolia 使用一种称为 ConQuer（概念查询）的方法，为放射学报告中的特定概念创建局部对齐。这种方法使模型能够更好地捕捉冗长医学文本中的细节，并通过为每个概念生成注意力图来提供内置的空间可解释性。Jolia 在分类和报告生成等任务的各种基准测试中表现出色，优于基线模型。 AI

影响这项研究可能带来更准确、更具可解释性的医学诊断和报告生成人工智能工具。

排序理由该集群描述了一篇详细介绍用于医学影像分析的新型人工智能模型和方法的研究论文。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。我们如何撰写摘要 →

报道来源 [3]

arXiv cs.CV TIER_1 English(EN) · Jianpeng Zhang · 2026-06-24 08:24

Disease-Centric Vision-Language Pretraining with Hybrid Visual Encoding for 3D Computed Tomography

Vision-language pre-training (VLP) holds great promise for general-purpose medical AI by leveraging radiology reports as rich textual supervision, yet existing methods struggle with 3D CT imaging due to inefficient visual backbones and coarse semantic alignment. To address these …
arXiv cs.CV TIER_1 English(EN) · Julien Khlaut, Charles Corbi\`ere, Baptiste Callard, Amaury Prat, Leo Butsanets, Antoine Saporta, Th\'eo Danielou, Leo Machado, Korentin Le Floch, Tom Boeken, Pierre Manceron, Corentin Dancette · 2026-06-24 04:00

Jolia: Concept-Level Vision-Language Alignment for 3D CT Contrastive Learning

arXiv:2606.24570v1 Announce Type: new Abstract: Vision-language contrastive pretraining has become the dominant recipe for 3D medical foundation models, leveraging the large volumes of paired scans and reports produced in clinical practice. However, medical images usually span do…
arXiv cs.CV TIER_1 English(EN) · Corentin Dancette · 2026-06-23 13:35

Jolia: Concept-Level Vision-Language Alignment for 3D CT Contrastive Learning

Vision-language contrastive pretraining has become the dominant recipe for 3D medical foundation models, leveraging the large volumes of paired scans and reports produced in clinical practice. However, medical images usually span dozens of organs, and radiological reports are muc…

报道来源 [3]

Disease-Centric Vision-Language Pretraining with Hybrid Visual Encoding for 3D Computed Tomography

Jolia: Concept-Level Vision-Language Alignment for 3D CT Contrastive Learning

Jolia: Concept-Level Vision-Language Alignment for 3D CT Contrastive Learning

相关实体

相关话题