English(EN) DiCoBench: Benchmarking Multi-Image Fine-Grained Perception via Differential and Commonality Visual Cues

新的DiCoBench基准测试揭示MLLM在高分辨率视觉感知方面存在挑战

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-25 05:02

研究人员推出了DiCoBench，这是一个新的基准测试，旨在通过高分辨率、多图像输入来评估多模态大语言模型（MLLM）的细粒度感知能力。该基准测试包含两个赛道和八个感知任务的765个样本，侧重于差异和共性视觉线索。对18个MLLM的评估显示，与人类准确性相比存在显著的性能差距，突显了捕捉微观细节方面的挑战。 AI

影响强调了当前MLLM在高分辨率视觉任务方面的局限性，可能指导未来在感知能力方面的研究。

排序理由该集群描述了一篇用于评估AI模型的新学术基准测试论文。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CV TIER_1 English(EN) · Geng Li, Yuxin Peng · 2026-06-26 04:00

DiCoBench: Benchmarking Multi-Image Fine-Grained Perception via Differential and Commonality Visual Cues

arXiv:2606.26602v1 Announce Type: new Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated impressive fine-grained perception capabilities. However, existing benchmarks predominantly rely on explicit textual cues or low-resolution inputs, fa…
arXiv cs.CV TIER_1 English(EN) · Yuxin Peng · 2026-06-25 05:02

DiCoBench：通过差异化和共性视觉线索对多图像细粒度感知进行基准测试

Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated impressive fine-grained perception capabilities. However, existing benchmarks predominantly rely on explicit textual cues or low-resolution inputs, failing to evaluate a model's ability to autonomou…

报道来源 [2]

DiCoBench: Benchmarking Multi-Image Fine-Grained Perception via Differential and Commonality Visual Cues

DiCoBench：通过差异化和共性视觉线索对多图像细粒度感知进行基准测试

相关实体

相关话题