Brief · PulseAugur

TOOL · arXiv cs.CV English(EN) · 1w

Automated Report-Derived Oncology VQA Benchmark for Evaluating Vision-Language Models on 3D Medical Imaging

Researchers have developed an automated pipeline to create a benchmark for evaluating vision-language models (VLMs) on 3D medical imaging, specifically for oncology. This pipeline generates question-answer datasets directly from radiology reports and 3D scans, producing both schema-derived and LLM-generated questions. Evaluations on four cancer cohorts revealed that no single VLM currently dominates, and performance varies significantly based on the dataset, with some models performing as well or better on certain scans even when blinded to the image. AI

IMPACT This benchmark aims to improve VLM evaluation in medical imaging, potentially leading to more reliable AI tools for diagnosis and treatment planning.

LLM
arXiv
Vision-Language Models
3D Medical Imaging
Automated Report-Derived Oncology VQA Benchmark