PulseAugur
实时 08:16:04
English(EN) When Symptoms Are Not Enough: Evidence-Weighting Patterns in Large Language Model Psychiatric Screening

大型语言模型在精神科筛查中表现不一,需要验证

一项发表在arXiv上的新研究评估了五种大型语言模型在精神科筛查中的表现,使用了包含555次访谈的基准。模型表现出不同的准确性,其中GPT-4.1 Mini和GPT-5 Mini显示出最一致的结果。研究人员发现,当患者报告功能完好或有社会支持时,大型语言模型倾向于低估症状证据,这凸显了在临床使用前需要进行仔细验证。 AI

影响 大型语言模型在可扩展的精神科筛查方面显示出潜力,但由于证据解读存在偏见,需要仔细验证。

排序理由 该集群包含一篇详细介绍大型语言模型能力和局限性研究的学术论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. arXiv cs.CL TIER_1 English(EN) · Jianfeng Zhu, Megan Korhummel, Ruoming Jin, Karin G. Coifman ·

    When Symptoms Are Not Enough: Evidence-Weighting Patterns in Large Language Model Psychiatric Screening

    arXiv:2605.23148v1 Announce Type: new Abstract: As demand for mental health care outpaces clinician-delivered assessment, scalable screening tools are increasingly needed. Large language models (LLMs) may identify psychiatric risk from patient narratives, but their reliability ac…

  2. arXiv cs.CL TIER_1 English(EN) · Karin G. Coifman ·

    When Symptoms Are Not Enough: Evidence-Weighting Patterns in Large Language Model Psychiatric Screening

    As demand for mental health care outpaces clinician-delivered assessment, scalable screening tools are increasingly needed. Large language models (LLMs) may identify psychiatric risk from patient narratives, but their reliability across diagnoses, demographic subgroups, and evide…