Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 7h

AVI-Bench: Toward Human-like Audio-Visual Intelligence of Omni-MLLMs

Researchers have introduced AVI-Bench, a new benchmark designed to evaluate the audio-visual intelligence of Omni-Multimodal Large Language Models (Omni-MLLMs). This benchmark assesses models across perception, understanding, and reasoning stages using tasks that require joint audio-visual interpretation. An extension, AVI-Bench-PriSe, further tests robustness with unfamiliar stimuli to gauge generalization beyond typical training data. Experiments indicate current Omni-MLLMs have significant limitations in audio-visual intelligence. AI

IMPACT Provides a new framework for evaluating and improving the audio-visual capabilities of multimodal AI models.

AVI-Bench
Omni-MLLMs
AVI-Bench-PriSe