PulseAugur
实时 22:40:13

LLM clinical accuracy varies significantly by prompting language, study finds

A new study published on arXiv reveals that the language used to prompt large language models significantly impacts their diagnostic reasoning and accuracy in clinical settings. Researchers found that four out of five evaluated models performed better when prompted in English compared to French, with English yielding higher scores in differential diagnosis, logical structure, and internal validity. Only one model, o3, showed no significant language-based performance difference, highlighting the need to consider linguistic and cultural factors for equitable global deployment of LLMs in healthcare. AI

影响 Highlights potential disparities in LLM clinical decision support based on language, impacting equitable access to AI healthcare tools.

排序理由 Academic paper detailing model performance evaluation. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

LLM clinical accuracy varies significantly by prompting language, study finds

报道来源 [1]

  1. arXiv cs.CL TIER_1 English(EN) · Pierre-Antoine Gourraud ·

    Prompting language influences diagnostic reasoning and accuracy of large language models

    Large language models (LLMs) are increasingly explored for clinical decision support, yet most evaluations are conducted in English, leaving their reliability in other languages uncertain. Here we evaluate the impact of prompting language on diagnostic reasoning and final diagnos…