New dataset 'Do What I Say' evaluates speech LLMs with spoken prompts

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have introduced the DoWhatISay (DOWIS) dataset, a multilingual collection of human-recorded spoken and written prompts designed to evaluate speech large language models (SLLMs) under realistic spoken instruction conditions. The dataset spans 9 tasks and 11 languages, offering 10 prompt variants per task-language pair in five styles. Initial benchmarking using DOWIS revealed that text prompts generally outperform spoken prompts, especially in low-resource and cross-lingual scenarios, though spoken prompts show promise for tasks requiring speech output. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Provides a new benchmark for evaluating SLLMs with spoken prompts, potentially improving real-world interaction capabilities.

RANK_REASON The cluster contains an academic paper detailing a new dataset for evaluating speech large language models.

Read on arXiv cs.CL →

paper
other

COVERAGE [1]

arXiv cs.CL TIER_1 · Maike Z\"ufle, Sara Papi, Fabian Retkowski, Szymon Mazurek, Marek Kasztelnik, Alexander Waibel, Luisa Bentivogli, Jan Niehues · 2026-05-01 04:00

Do What I Say: A Spoken Prompt Dataset for Instruction-Following

arXiv:2603.09881v2 Announce Type: replace Abstract: Speech Large Language Models (SLLMs) have rapidly expanded, supporting a wide range of tasks. These models are typically evaluated using text prompts, which may not reflect real-world scenarios where users interact with speech. …

COVERAGE [1]

Do What I Say: A Spoken Prompt Dataset for Instruction-Following

RELATED ENTITIES

RELATED TOPICS