Researchers have introduced AVI-Bench, a new benchmark designed to evaluate the audio-visual intelligence of Omni-Multimodal Large Language Models (Omni-MLLMs). This benchmark assesses models across perception, understanding, and reasoning stages using tasks that require joint audio-visual interpretation. An extension, AVI-Bench-PriSe, further tests robustness with unfamiliar stimuli to gauge generalization beyond typical training data. Experiments indicate current Omni-MLLMs have significant limitations in audio-visual intelligence. AI
IMPACT Provides a new framework for evaluating and improving the audio-visual capabilities of multimodal AI models.
RANK_REASON The cluster contains a research paper introducing a new benchmark for evaluating AI models. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →