LLM clinical accuracy varies significantly by prompting language, study finds

By PulseAugur Editorial · [1 sources] · 2026-05-18 22:55

A new study published on arXiv reveals that the language used to prompt large language models significantly impacts their diagnostic reasoning and accuracy in clinical settings. Researchers found that four out of five evaluated models performed better when prompted in English compared to French, with English yielding higher scores in differential diagnosis, logical structure, and internal validity. Only one model, o3, showed no significant language-based performance difference, highlighting the need to consider linguistic and cultural factors for equitable global deployment of LLMs in healthcare. AI

IMPACT Highlights potential disparities in LLM clinical decision support based on language, impacting equitable access to AI healthcare tools.

RANK_REASON Academic paper detailing model performance evaluation. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

paper
safety

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

LLM clinical accuracy varies significantly by prompting language, study finds

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Pierre-Antoine Gourraud · 2026-05-18 22:55

Prompting language influences diagnostic reasoning and accuracy of large language models

Large language models (LLMs) are increasingly explored for clinical decision support, yet most evaluations are conducted in English, leaving their reliability in other languages uncertain. Here we evaluate the impact of prompting language on diagnostic reasoning and final diagnos…

COVERAGE [1]

Prompting language influences diagnostic reasoning and accuracy of large language models

RELATED ENTITIES

RELATED TOPICS