Researchers have developed a benchmark for evaluating multimodal large language models (MLLMs) in clinical question answering, specifically for pulmonary embolism (PE) risk assessment. The study utilized the INSPECT dataset, comprising over 23,000 CTPA studies, and formulated eight diagnostic and prognostic tasks. Results indicated that models like Gemma4 E4B and Gemma4 E2B performed better when incorporating electronic health record (EHR) data alongside CTPA images, particularly for PE diagnosis compared to prognostic tasks like readmission prediction. This suggests a strong potential for compact multimodal models in early PE risk detection and explanation. AI
IMPACT This research demonstrates the potential of multimodal LLMs in clinical settings, suggesting future applications in early disease risk detection and explanation.
RANK_REASON The cluster contains an academic paper detailing a new benchmark and evaluation of multimodal large language models for a specific clinical task. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →