English(EN) Prompting language influences diagnostic reasoning and accuracy of large language models

研究发现，LLM的临床准确性因提示语言而异

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-18 22:55

一篇新发表在arXiv上的研究表明，用于提示大型语言模型的语言显著影响其在临床环境中的诊断推理和准确性。研究人员发现，在用英语提示时，五种评估模型中有四种模型的表现优于法语提示，英语在鉴别诊断、逻辑结构和内部有效性方面得分更高。只有一种模型o3在基于语言的性能上没有显著差异，这凸显了在医疗保健领域公平部署LLM时需要考虑语言和文化因素。 AI

影响强调了基于语言的LLM临床决策支持可能存在的差异，影响了对人工智能医疗工具的公平获取。

排序理由学术论文，详细介绍模型性能评估。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Pierre-Antoine Gourraud · 2026-05-18 22:55

Prompting language influences diagnostic reasoning and accuracy of large language models

Large language models (LLMs) are increasingly explored for clinical decision support, yet most evaluations are conducted in English, leaving their reliability in other languages uncertain. Here we evaluate the impact of prompting language on diagnostic reasoning and final diagnos…

报道来源 [1]

Prompting language influences diagnostic reasoning and accuracy of large language models

相关实体

相关话题