Researchers have introduced DiCoBench, a new benchmark designed to evaluate the fine-grained perception capabilities of Multimodal Large Language Models (MLLMs) using high-resolution, multi-image inputs. The benchmark features 765 samples across two tracks and eight perception tasks, focusing on differential and commonality visual cues. Evaluations of 18 MLLMs showed a significant performance gap compared to human accuracy, highlighting challenges in capturing micro-scale details. AI
IMPACT Highlights limitations in current MLLMs for high-resolution visual tasks, potentially guiding future research in perception capabilities.
RANK_REASON The cluster describes a new academic benchmark paper for evaluating AI models.
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →