GlobeAudio: A Multilingual Multicultural Benchmark for Naturalistic Evaluation of Large Audio-Language Models
Researchers have introduced GlobeAudio, a new benchmark designed to evaluate Large Audio-Language Models (LALMs) in more realistic, naturalistic settings. The benchmark features 5,637 multiple-choice questions in six diverse languages, created by native speakers using naturally occurring audio. Initial evaluations using GlobeAudio revealed significant performance disparities, especially for open-source models and less common languages, highlighting current limitations in LALM capabilities. AI
IMPACT Highlights critical limitations in current LALMs and emphasizes the need for more realistic audio evaluation methods.