PulseAugur
实时 10:49:13

Clinical AI Fails on Complex Questions Due to Transformer Limits

一篇新研究论文发布在arXiv上,探讨了大型语言模型在临床问答中的局限性。研究发现,像Claude Sonnet、GPT-4o和GPT-5.4-2026-03-05这样的模型,随着临床问题所需推理复杂度的增加,准确率会显著下降。这种下降归因于Transformer架构固有的组合推理限制,而不是电子健康记录数据截断问题。 AI

影响 通过展示准确率随问题复杂度下降,突显了部署临床AI的潜在风险,表明需要进行仔细的风险分层。

排序理由 该集群包含一篇发布在arXiv上的研究论文,详细介绍了关于AI模型性能的实证研究结果。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. arXiv cs.AI TIER_1 English(EN) · Sanjay Basu ·

    Compositional Reasoning Depth Predicts Clinical AI Failure: Empirical Evidence Consistent with Transformer Compositionality Limits in Electronic Health Record Question Answering

    arXiv:2606.16890v1 Announce Type: cross Abstract: Aggregate accuracy benchmarks conceal a systematic structure in how large language models fail at electronic health record (EHR) question answering: questions requiring more inferential steps produce disproportionately more errors…

  2. arXiv cs.AI TIER_1 English(EN) · Sanjay Basu ·

    Compositional Reasoning Depth Predicts Clinical AI Failure: Empirical Evidence Consistent with Transformer Compositionality Limits in Electronic Health Record Question Answering

    Aggregate accuracy benchmarks conceal a systematic structure in how large language models fail at electronic health record (EHR) question answering: questions requiring more inferential steps produce disproportionately more errors. Motivated by theoretical results on transformer …