Researchers have introduced CORTEX, a new benchmark designed to improve the trustworthiness of multimodal large language models (MLLMs) in 3D chest CT analysis. Existing datasets often reduce complex radiology reports to simple answer pairs, omitting the crucial reasoning process that clinicians use. CORTEX addresses this by providing structured, four-stage diagnostic traces that mirror a radiologist's workflow, from visual observation to answer synthesis. This benchmark, built on the CT-RATE dataset and validated with clinician input, includes over 76,000 reasoning traces to enable the development and evaluation of MLLMs that can provide traceable and verifiable diagnoses. AI
IMPACT This benchmark could lead to more reliable and interpretable AI diagnostic tools in medical imaging.
RANK_REASON The cluster describes a new academic benchmark and dataset for AI research.
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →