New benchmark AVI-Bench reveals limitations in Omni-MLLM audio-visual intelligence

By PulseAugur Editorial · [1 sources] · 2026-06-09 04:00

Researchers have introduced AVI-Bench, a new benchmark designed to evaluate the audio-visual intelligence of Omni-Multimodal Large Language Models (Omni-MLLMs). This benchmark assesses models across perception, understanding, and reasoning stages using tasks that require joint audio-visual interpretation. An extension, AVI-Bench-PriSe, further tests robustness with unfamiliar stimuli to gauge generalization beyond typical training data. Experiments indicate current Omni-MLLMs have significant limitations in audio-visual intelligence. AI

IMPACT Provides a new framework for evaluating and improving the audio-visual capabilities of multimodal AI models.

RANK_REASON The cluster contains a research paper introducing a new benchmark for evaluating AI models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Yaoting Wang, Ziyi Zhang, Wenming Tu, Shaoxuan Xu, Wenjie Du, Cheng Liang, Weijun Wang, Yuanchao Li, Guangyao Li, Hao Fei, Yuanchun Li, Henghui Ding, Yunxin Liu · 2026-06-09 04:00

AVI-Bench: Toward Human-like Audio-Visual Intelligence of Omni-MLLMs

arXiv:2606.07643v1 Announce Type: cross Abstract: Recent advances in Omni-Multimodal Large Language Models (Omni-MLLMs) have enabled strong integration of vision, audio, and language. However, their audio-visual intelligence (AVI) remains insufficiently evaluated due to the lack …

COVERAGE [1]

AVI-Bench: Toward Human-like Audio-Visual Intelligence of Omni-MLLMs

RELATED TOPICS