Researchers have introduced Audio2Tool, a new benchmark dataset designed to evaluate the function-calling capabilities of spoken language models. The dataset includes approximately 30,000 queries across smart car, smart home, and wearable domains, featuring a complexity hierarchy from simple commands to multi-intent requests. Evaluations of current state-of-the-art models revealed significant performance degradation when faced with compositional challenges and acoustic variations, highlighting areas for future improvement. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Introduces a new benchmark to better evaluate spoken language models' ability to call tools, potentially driving improvements in voice assistant capabilities.
RANK_REASON The cluster describes a new academic paper introducing a novel dataset and benchmark for evaluating spoken language models.