PulseAugur
实时 11:49:06

Multimodal LLMs show limited real-world accuracy in clinical dermatology

A new study evaluated the real-world performance of multimodal large language models (MLLMs) in clinical dermatology, finding a significant gap between benchmark results and actual clinical utility. While models like GPT-4.1 showed promise on public datasets, their diagnostic accuracy dropped considerably when applied to a real-world cohort of 5,811 cases. Incorporating clinical context improved performance, but outputs remained sensitive to data inaccuracies, suggesting current MLLMs are not yet reliable for clinical deployment. AI

影响 Current multimodal LLMs show a significant performance drop in real-world clinical dermatology compared to benchmarks, indicating they are not yet ready for deployment.

排序理由 This is a research paper evaluating the performance of existing multimodal LLMs on a specific clinical task. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

Multimodal LLMs show limited real-world accuracy in clinical dermatology

报道来源 [1]

  1. arXiv cs.CV TIER_1 English(EN) · Roy Jiang, Hyunjae Kim, Zhenyue Qin, Morten Lee, Margaret MacGibeny, Ailish Hanly, Angela Sadlowski, Shanin Chowdhury, Xuguang Ai, Jeffrey Gehlhausen, Qingyu Chen ·

    Are Multimodal LLMs Ready for Clinical Dermatology? A Real-World Evaluation in Dermatology

    arXiv:2605.04098v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) have demonstrated promise on publicly available dermatology benchmarks. However, benchmark performance may not generalize to real-world dermatologic decision-making. To quantify this benchmar…