RadOT-Eval: Auditable Structured-Evidence Transport for Radiology Report Evaluation
Researchers have developed RadOT-Eval, a novel framework for evaluating the accuracy of AI-generated radiology reports. This system breaks down reports into structured clinical evidence units and uses optimal transport to align corresponding pieces of information. RadOT-Eval demonstrated strong correlations with human-annotated error burdens, outperforming existing metrics and an LLM-based evaluator on independent datasets. AI
IMPACT Provides a more auditable and accurate method for evaluating high-stakes AI-generated clinical text, potentially improving safety and reliability in medical applications.