PulseAugur
EN
LIVE 10:21:40

New HEALTHDIAL dataset released for multilingual spoken dialogue systems

Researchers have introduced HEALTHDIAL, a new large-scale, multilingual dataset designed for developing and evaluating retrieval-augmented generation (RAG) systems in spoken dialogue. The dataset includes 6,000 information-seeking dialogues across Arabic, Chinese, English, and Spanish, grounded in World Health Organization (WHO) content. It also features 163 hours of recorded speech from native speakers and detailed demographic and sociolinguistic annotations. Initial benchmark results indicate performance disparities across languages, even for those considered high-resource. AI

IMPACT Enables development and evaluation of multilingual spoken dialogue systems, potentially improving access to health information.

RANK_REASON The cluster describes the release of a new academic dataset for AI research.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New HEALTHDIAL dataset released for multilingual spoken dialogue systems

COVERAGE [2]

  1. arXiv cs.CL TIER_1 English(EN) · Songbo Hu, Yinhong Liu, Ej Zhou, Evgeniia Razumovskaia, Xiaobin Wang, Alexander Fraser, Ivan Vuli\'c, Anna Korhonen ·

    Dial HEALTHDIAL for Advice: A Multilingual and Multi-Parallel Spoken Dialogue Dataset for Knowledge-Grounded Information Seeking

    arXiv:2605.30107v1 Announce Type: new Abstract: Creating spoken dialogue datasets is methodologically challenging, and these challenges are amplified when the goal is to build multilingual, multi-parallel datasets at scale. This work introduces HEALTHDIAL, a large-scale, multilin…

  2. arXiv cs.CL TIER_1 English(EN) · Anna Korhonen ·

    Dial HEALTHDIAL for Advice: A Multilingual and Multi-Parallel Spoken Dialogue Dataset for Knowledge-Grounded Information Seeking

    Creating spoken dialogue datasets is methodologically challenging, and these challenges are amplified when the goal is to build multilingual, multi-parallel datasets at scale. This work introduces HEALTHDIAL, a large-scale, multilingual, and multi-parallel dataset for developing …