CNNs, Transformers, Hybrid, and Vision Language Models for Skin Cancer Detection
Researchers have conducted a comprehensive evaluation of twelve deep learning models for detecting skin cancer using a unified approach on the PAD-UFES-20 dataset. The study compared convolutional neural networks (CNNs), vision transformers (ViTs), hybrid models, and vision-language models (VLMs). While well-tuned CNNs offer a solid baseline, transformer-based architectures generally showed superior discrimination capabilities. Hybrid models and a SigLIP-based VLM achieved the best overall performance, providing practical insights for real-world deployment in skin cancer screening. AI
IMPACT Provides practical guidance on selecting deep learning models for real-world skin cancer screening applications.