LLM-assisted cleaning improves chest CT dataset labels, study finds

By PulseAugur Editorial · [1 sources] · 2026-06-21 08:01

A new study published on Hugging Face demonstrates the effectiveness of large language models (LLMs) in cleaning and verifying labels within large-scale medical imaging datasets. Researchers utilized GPT-5.4 to compare existing labels against LLM-generated labels for chest CT scans, finding a high overall agreement rate of 96.4%. The LLM-assisted approach proved particularly adept at identifying and correcting discrepancies, especially for conditions like lymphadenopathy, and may offer a scalable solution for improving the quality of public imaging datasets for future research. AI

IMPACT LLM-assisted label cleaning can significantly improve the quality and scalability of medical imaging datasets, aiding future research.

RANK_REASON The cluster contains a research paper detailing the use of an LLM for data cleaning in a medical imaging dataset. [lever_c_demoted from research: ic=1 ai=1.0]

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

LLM-assisted cleaning improves chest CT dataset labels, study finds

COVERAGE [1]

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-21 08:01

Large Language Model-Assisted Cleaning of Report-Derived Labels in a Large-Scale Chest CT Dataset

Purpose: To evaluate whether large language model (LLM)-assisted label cleaning can identify label-report discordance in CT-RATE, a large-scale public chest CT dataset. Materials and Methods: After report-level deduplication, 24,446 unique radiology reports were identified. Twelv…

COVERAGE [1]

Large Language Model-Assisted Cleaning of Report-Derived Labels in a Large-Scale Chest CT Dataset

RELATED ENTITIES

RELATED TOPICS