Trusting Right Predictions for Wrong Reasons: A LIME Based Analysis of Deep Learning Interpretability in Lung Cancer Diagnosis
A new study published on arXiv explores the interpretability of deep learning models used for lung cancer diagnosis. While three distinct models (CNN, ResNet50, and ViT) demonstrated high predictive accuracy, with ResNet50 achieving 98.61% accuracy and all models reaching a 0.99 ROC-AUC score, their reasoning processes differed significantly. Using LIME for interpretability analysis, researchers found that explanation correlations between model pairs were below 0.26, indicating that the models focused on different image regions to arrive at their predictions. This suggests that prediction agreement alone is insufficient for validating clinical AI systems, and interpretability should be a separate, crucial evaluation criterion. AI
IMPACT Highlights the critical need for interpretability in clinical AI, suggesting current high-accuracy models may not be reliable due to differing reasoning processes.