English(EN) COHERENCE: Benchmarking Fine-Grained Image-Text Alignment in Interleaved Multimodal Contexts

COHERENCE基准测试评估MLLM在交错上下文中的细粒度图像-文本对齐能力

作者 PulseAugur 编辑部 · [2 个来源] · 2026-04-30 03:59

研究人员推出了一项名为COHERENCE的新基准测试，旨在评估多模态大型语言模型（MLLM）的细粒度图像-文本对齐能力。现有的基准测试常常忽略真实文档中交错图像-文本上下文的复杂性。COHERENCE通过评估MLLM在混合媒体环境中连接视觉和文本信息的能力来弥补这一不足，涵盖四个不同领域，包含超过6000个问题。 AI

影响为多模态模型提供了一个新的评估框架，突出了当前在理解交错图像-文本数据方面的局限性。

排序理由这是一篇介绍AI模型评估新基准的研究论文。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Bingli Wang, Huanze Tang, Haijun Lv, Zhishan Lin, Lixin Gu, Lei Feng, Qipeng Guo, Kai Chen · 2026-05-01 04:00

COHERENCE: Benchmarking Fine-Grained Image-Text Alignment in Interleaved Multimodal Contexts

arXiv:2604.27389v1 Announce Type: cross Abstract: In recent years, Multimodal Large Language Models (MLLMs) have achieved remarkable progress on a wide range of multimodal benchmarks. Despite these advances, most existing benchmarks mainly focus on single-image or multi-image com…
arXiv cs.CV TIER_1 English(EN) · Kai Chen · 2026-04-30 03:59

COHERENCE: Benchmarking Fine-Grained Image-Text Alignment in Interleaved Multimodal Contexts

In recent years, Multimodal Large Language Models (MLLMs) have achieved remarkable progress on a wide range of multimodal benchmarks. Despite these advances, most existing benchmarks mainly focus on single-image or multi-image comprehension. In real-world scenarios such as docume…

报道来源 [2]

COHERENCE: Benchmarking Fine-Grained Image-Text Alignment in Interleaved Multimodal Contexts

COHERENCE: Benchmarking Fine-Grained Image-Text Alignment in Interleaved Multimodal Contexts

相关实体

相关话题