Audio-FLAN: An Instruction-Following Dataset for Unified Audio Understanding and Generation of Speech, Music, and Sound
Researchers have introduced Audio-FLAN, a new large-scale dataset designed to unify audio understanding and generation tasks for large language models. The dataset comprises over 100 million instances across 80 diverse tasks, covering speech, music, and general sound domains. Audio-FLAN aims to enable zero-shot learning for unified audio-language models, allowing them to handle both comprehension and creation of audio content. AI
IMPACT Enables unified audio-language models for diverse understanding and generation tasks.