LLM-assisted label cleaning improves chest CT dataset accuracy

By PulseAugur Editorial · [1 sources] · 2026-06-21 08:01

Researchers have developed a method using large language models (LLMs) to improve the accuracy of labels in large-scale medical imaging datasets. By comparing existing labels in the CT-RATE chest CT dataset with labels generated by GPT-5.4, they identified instances of label-report discordance. Radiologist adjudication supported the LLM-derived labels in a significant majority of cases, suggesting that LLM-assisted cleaning can enhance the quality of public imaging datasets for future research. AI

IMPACT Enhances the quality and reliability of medical imaging datasets, potentially accelerating AI research and development in healthcare.

RANK_REASON Academic paper detailing a new methodology for data cleaning using LLMs. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

LLM-assisted label cleaning improves chest CT dataset accuracy

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Osamu Abe · 2026-06-21 08:01

Large Language Model-Assisted Cleaning of Report-Derived Labels in a Large-Scale Chest CT Dataset

Purpose: To evaluate whether large language model (LLM)-assisted label cleaning can identify label-report discordance in CT-RATE, a large-scale public chest CT dataset. Materials and Methods: After report-level deduplication, 24,446 unique radiology reports were identified. Twelv…

COVERAGE [1]

Large Language Model-Assisted Cleaning of Report-Derived Labels in a Large-Scale Chest CT Dataset

RELATED ENTITIES

RELATED TOPICS