Brief · PulseAugur

RESEARCH · arXiv cs.LG English(EN) · 1mo

Audio2Tool: Bridging Spoken Language Understanding and Function Calling

Researchers have introduced Audio2Tool, a new benchmark dataset designed to evaluate the function-calling capabilities of spoken language models. The dataset includes approximately 30,000 queries across smart car, smart home, and wearable domains, featuring a complexity hierarchy from simple commands to multi-intent requests. Evaluations of current state-of-the-art models revealed significant performance degradation when faced with compositional challenges and acoustic variations, highlighting areas for future improvement. AI

IMPACT Introduces a new benchmark to better evaluate spoken language models' ability to call tools, potentially driving improvements in voice assistant capabilities.

Audio2Tool
SpeechLM
ASR-LLM