Audio2Tool dataset evaluates SpeechLMs on complex voice commands

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have introduced Audio2Tool, a new benchmark dataset designed to evaluate the function-calling capabilities of spoken language models. The dataset includes approximately 30,000 queries across smart car, smart home, and wearable domains, featuring a complexity hierarchy from simple commands to multi-intent requests. Evaluations of current state-of-the-art models revealed significant performance degradation when faced with compositional challenges and acoustic variations, highlighting areas for future improvement. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Introduces a new benchmark to better evaluate spoken language models' ability to call tools, potentially driving improvements in voice assistant capabilities.

RANK_REASON The cluster describes a new academic paper introducing a novel dataset and benchmark for evaluating spoken language models.

Read on arXiv cs.LG →

paper
other

COVERAGE [1]

arXiv cs.LG TIER_1 · Ramit Pahwa, Apoorva Beedu, Parivesh Priye, Rutu Gandhi, Saloni Takawale, Aruna Baijal, Zengli Yang · 2026-04-28 04:00

Audio2Tool: Bridging Spoken Language Understanding and Function Calling

arXiv:2604.22821v1 Announce Type: cross Abstract: Voice assistants increasingly rely on Speech Language Models (SpeechLMs) to interpret spoken queries and execute complex tasks, yet existing benchmarks lack domain breadth, acoustic diversity, and compositional reasoning complexit…

COVERAGE [1]

Audio2Tool: Bridging Spoken Language Understanding and Function Calling

RELATED TOPICS