PulseAugur
实时 22:12:52

SHIELD: A Diverse Clinical Note Dataset and Distilled Small Language Models for Enterprise-Scale…

Researchers have introduced SHIELD, a new dataset comprising 1,394 clinical notes with over 10,000 identified Protected Health Information (PHI) spans. This dataset aims to address the limitations of older benchmarks by offering greater diversity in modern clinical narratives. The project also developed distilled Small Language Models (SLMs) capable of de-identifying clinical text efficiently on standard hardware, achieving high precision and recall. AI

影响 Provides a more diverse dataset and efficient models for de-identifying clinical text, potentially enabling broader secondary use of EHR data.

排序理由 The cluster contains an academic paper detailing a new dataset and distilled models for de-identification.

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

SHIELD: A Diverse Clinical Note Dataset and Distilled Small Language Models for Enterprise-Scale…

报道来源 [2]

  1. arXiv cs.CL TIER_1 English(EN) · Jose D. Posada, David Love, Somalee Datta, Priya Desai ·

    SHIELD: A Diverse Clinical Note Dataset and Distilled Small Language Models for Enterprise-Scale De-identification

    arXiv:2605.03301v1 Announce Type: new Abstract: De-identification of clinical text remains essential for secondary use of electronic health records (EHRs), yet public benchmarks such as i2b2 2006/2014 are over a decade old and lack the semantic and demographic diversity of modern…

  2. arXiv cs.CL TIER_1 English(EN) · Priya Desai ·

    SHIELD: A Diverse Clinical Note Dataset and Distilled Small Language Models for Enterprise-Scale De-identification

    De-identification of clinical text remains essential for secondary use of electronic health records (EHRs), yet public benchmarks such as i2b2 2006/2014 are over a decade old and lack the semantic and demographic diversity of modern narratives. While Large Language Models (LLMs) …