Brief

last 24h

[2/2] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · arXiv cs.AI English(EN) · 1w · [3 sources]

GlobeAudio: A Multilingual Multicultural Benchmark for Naturalistic Evaluation of Large Audio-Language Models

Researchers have introduced GlobeAudio, a new benchmark designed to evaluate Large Audio-Language Models (LALMs) in more realistic, naturalistic settings. The benchmark features 5,637 multiple-choice questions in six diverse languages, created by native speakers using naturally occurring audio. Initial evaluations using GlobeAudio revealed significant performance disparities, especially for open-source models and less common languages, highlighting current limitations in LALM capabilities. AI

IMPACT Highlights critical limitations in current LALMs and emphasizes the need for more realistic audio evaluation methods.
RESEARCH · arXiv cs.LG English(EN) · 1mo

Audio2Tool: Bridging Spoken Language Understanding and Function Calling

Researchers have introduced Audio2Tool, a new benchmark dataset designed to evaluate the function-calling capabilities of spoken language models. The dataset includes approximately 30,000 queries across smart car, smart home, and wearable domains, featuring a complexity hierarchy from simple commands to multi-intent requests. Evaluations of current state-of-the-art models revealed significant performance degradation when faced with compositional challenges and acoustic variations, highlighting areas for future improvement. AI

IMPACT Introduces a new benchmark to better evaluate spoken language models' ability to call tools, potentially driving improvements in voice assistant capabilities.

Brief

GlobeAudio: A Multilingual Multicultural Benchmark for Naturalistic Evaluation of Large Audio-Language Models

Audio2Tool: Bridging Spoken Language Understanding and Function Calling