PulseAugur
LIVE 15:21:12
research · [1 source] ·
0
research

New dataset 'Do What I Say' evaluates speech LLMs with spoken prompts

Researchers have introduced the DoWhatISay (DOWIS) dataset, a multilingual collection of human-recorded spoken and written prompts designed to evaluate speech large language models (SLLMs) under realistic spoken instruction conditions. The dataset spans 9 tasks and 11 languages, offering 10 prompt variants per task-language pair in five styles. Initial benchmarking using DOWIS revealed that text prompts generally outperform spoken prompts, especially in low-resource and cross-lingual scenarios, though spoken prompts show promise for tasks requiring speech output. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Provides a new benchmark for evaluating SLLMs with spoken prompts, potentially improving real-world interaction capabilities.

RANK_REASON The cluster contains an academic paper detailing a new dataset for evaluating speech large language models.

Read on arXiv cs.CL →

COVERAGE [1]

  1. arXiv cs.CL TIER_1 · Maike Z\"ufle, Sara Papi, Fabian Retkowski, Szymon Mazurek, Marek Kasztelnik, Alexander Waibel, Luisa Bentivogli, Jan Niehues ·

    Do What I Say: A Spoken Prompt Dataset for Instruction-Following

    arXiv:2603.09881v2 Announce Type: replace Abstract: Speech Large Language Models (SLLMs) have rapidly expanded, supporting a wide range of tasks. These models are typically evaluated using text prompts, which may not reflect real-world scenarios where users interact with speech. …