Researchers have introduced OMHBench, a new benchmark designed to evaluate the multi-hop reasoning capabilities of omni-modal large language models (MLLMs). This benchmark features 6,144 questions with balanced reasoning paths across text, vision, and speech modalities, aiming to overcome limitations of existing frameworks such as modality shortcuts. Evaluations using OMHBench revealed a significant performance disparity between proprietary and open-source MLLMs, with even leading proprietary models showing sensitivity to reasoning path variations and struggling particularly with speech processing. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Introduces a new evaluation standard for omni-modal LLMs, highlighting current model weaknesses in speech processing and balanced reasoning.
RANK_REASON The cluster describes a new academic paper introducing a novel benchmark for evaluating multimodal large language models.