Researchers have introduced the DoWhatISay (DOWIS) dataset, a multilingual collection of human-recorded spoken and written prompts designed to evaluate speech large language models (SLLMs) under realistic spoken instruction conditions. The dataset spans 9 tasks and 11 languages, offering 10 prompt variants per task-language pair in five styles. Initial benchmarking using DOWIS revealed that text prompts generally outperform spoken prompts, especially in low-resource and cross-lingual scenarios, though spoken prompts show promise for tasks requiring speech output. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Provides a new benchmark for evaluating SLLMs with spoken prompts, potentially improving real-world interaction capabilities.
RANK_REASON The cluster contains an academic paper detailing a new dataset for evaluating speech large language models.