PulseAugur
EN
LIVE 07:07:46

New Italian clinical notes corpus aims to boost medical LLMs

Researchers have introduced EDEN (Emergency Department Electronic Notes), a substantial corpus of Italian clinical notes designed to advance Large Language Models in medical applications. The dataset comprises approximately 4 million anonymized notes from Italian hospital emergency departments, with a subset of 6,000 notes meticulously annotated by clinical experts. This annotation covers 132 items related to patient situations like dyspnea and loss of consciousness, offering a rich, albeit imbalanced, resource for structured information extraction tasks. EDEN aims to be the largest freely available corpus of Italian clinical notes, providing a benchmark for CRF-filling and offering baseline results from Gemma-27B and MedGemma-27B models. AI

IMPACT Provides a large-scale, specialized dataset to improve LLM performance in Italian medical contexts.

RANK_REASON The cluster describes a new research paper introducing a dataset for AI research. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.AI TIER_1 English(EN) · Tiziano Labruna, Guido Bertolini, Pietro Ferrazzi, Bernardo Magnini ·

    EDEN: A Large-Scale Corpus of Clinical Notes for Italian

    arXiv:2606.12569v1 Announce Type: cross Abstract: We present EDEN (Emergency Department Electronic Notes), a new and unique large-scale corpus of clinical notes produced in Emergency Departments of Italian hospitals. The corpus, in its current version, is composed of approximatel…