PulseAugur
实时 16:08:05
English(EN) Reliability Auditing for Downstream LLM tasks in Psychiatry: LLM-Generated Hospitalization Risk Scores

LLM在精神病风险评分中表现出不稳定性,涉及无关数据

一项新研究评估了大型语言模型(LLM)在预测精神病住院风险方面的可靠性。研究人员发现,在患者资料中包含医学上不重要的细节会显著增加四个经审计的LLM的预测风险评分和输出变异性:Gemini 2.5 FlashLLaMa 3.3 70b、Claude Sonnet 4.6和GPT-4o mini。研究强调,基于LLM的精神病评估对非临床信息敏感,凸显了在临床部署前进行系统性评估的必要性。 AI

影响 揭示了LLM临床风险评估中潜在的不可靠性,敦促在精神病学等敏感领域部署前需谨慎。

排序理由 学术论文,详细介绍了在特定领域评估LLM可靠性的新方法。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

LLM在精神病风险评分中表现出不稳定性,涉及无关数据

报道来源 [2]

  1. arXiv cs.LG TIER_1 English(EN) · Shevya Pandya, Shinjini Bose, Ananya Joshi ·

    Reliability Auditing for Downstream LLM tasks in Psychiatry: LLM-Generated Hospitalization Risk Scores

    arXiv:2604.22063v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly utilized in clinical reasoning and risk assessment. However, their interpretive reliability in critical and indeterminate domains such as psychiatry remains unclear. Prior work has ident…

  2. arXiv cs.AI TIER_1 English(EN) · Ananya Joshi ·

    Reliability Auditing for Downstream LLM tasks in Psychiatry: LLM-Generated Hospitalization Risk Scores

    Large language models (LLMs) are increasingly utilized in clinical reasoning and risk assessment. However, their interpretive reliability in critical and indeterminate domains such as psychiatry remains unclear. Prior work has identified algorithmic biases and prompt sensitivity …