English(EN) Before the Labels: How Dataset Construction Shapes Suicidality Detection in Clinical Text

研究发现：临床NLP数据集塑造自杀倾向检测

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-17 22:31

一篇新论文认为，临床文本数据集的构建方式显著影响自然语言处理（NLP）中自杀倾向检测的准确性和解释。研究强调，从电子健康记录（EHRs）构建的数据集，例如源自MIMIC-III的ScAN数据集，通常反映了临床医生的判断，并将自杀倾向操作化为一个有限的事件。这可能会掩盖原始临床表述中存在的时间性、否定性和不确定性的细微差别，导致对NLP模型输出的潜在误读。 AI

影响强调了在临床NLP中仔细进行数据集策展和解释的关键需求，以确保准确和合乎道德的AI应用。

排序理由该集群包含一篇讨论AI研究方法学的学术论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Priyanshi Garg, Ishita Rao, Jieqiong Ding, Amandalynne Paullada · 2026-06-19 04:00

标签之前：数据集构建如何塑造临床文本中的自杀意念检测

arXiv:2606.19637v1 Announce Type: cross Abstract: Clinical NLP increasingly relies on electronic health record (EHR) data to detect suicidal behaviors, treating clinical documentation as more reliable ground truth than social media. We argue that this framing obscures how EHR-bas…
arXiv cs.CL TIER_1 English(EN) · Amandalynne Paullada · 2026-06-17 22:31

标签之前：数据集构建如何塑造临床文本中的自杀倾向检测

Clinical NLP increasingly relies on electronic health record (EHR) data to detect suicidal behaviors, treating clinical documentation as more reliable ground truth than social media. We argue that this framing obscures how EHR-based suicidality datasets encode a particular operat…

报道来源 [2]

标签之前：数据集构建如何塑造临床文本中的自杀意念检测

标签之前：数据集构建如何塑造临床文本中的自杀倾向检测

相关实体

相关话题