Researchers have introduced SHIELD, a new dataset comprising 1,394 clinical notes with over 10,000 identified Protected Health Information (PHI) spans. This dataset aims to address the limitations of older benchmarks by offering greater diversity in modern clinical narratives. The project also developed distilled Small Language Models (SLMs) capable of de-identifying clinical text efficiently on standard hardware, achieving high precision and recall. AI
影响 Provides a more diverse dataset and efficient models for de-identifying clinical text, potentially enabling broader secondary use of EHR data.
排序理由 The cluster contains an academic paper detailing a new dataset and distilled models for de-identification.
AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →