PulseAugur
EN
LIVE 06:14:19

New DiCoBench benchmark reveals MLLM struggles with high-resolution visual perception

Researchers have introduced DiCoBench, a new benchmark designed to evaluate the fine-grained perception capabilities of Multimodal Large Language Models (MLLMs) using high-resolution, multi-image inputs. The benchmark features 765 samples across two tracks and eight perception tasks, focusing on differential and commonality visual cues. Evaluations of 18 MLLMs showed a significant performance gap compared to human accuracy, highlighting challenges in capturing micro-scale details. AI

IMPACT Highlights limitations in current MLLMs for high-resolution visual tasks, potentially guiding future research in perception capabilities.

RANK_REASON The cluster describes a new academic benchmark paper for evaluating AI models.

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New DiCoBench benchmark reveals MLLM struggles with high-resolution visual perception

COVERAGE [2]

  1. arXiv cs.CV TIER_1 English(EN) · Geng Li, Yuxin Peng ·

    DiCoBench: Benchmarking Multi-Image Fine-Grained Perception via Differential and Commonality Visual Cues

    arXiv:2606.26602v1 Announce Type: new Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated impressive fine-grained perception capabilities. However, existing benchmarks predominantly rely on explicit textual cues or low-resolution inputs, fa…

  2. arXiv cs.CV TIER_1 English(EN) · Yuxin Peng ·

    DiCoBench: Benchmarking Multi-Image Fine-Grained Perception via Differential and Commonality Visual Cues

    Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated impressive fine-grained perception capabilities. However, existing benchmarks predominantly rely on explicit textual cues or low-resolution inputs, failing to evaluate a model's ability to autonomou…