PulseAugur
实时 07:31:57
English(EN) COHERENCE: Benchmarking Fine-Grained Image-Text Alignment in Interleaved Multimodal Contexts

COHERENCE基准测试评估MLLM在交错上下文中的细粒度图像-文本对齐能力

研究人员推出了一项名为COHERENCE的新基准测试,旨在评估多模态大型语言模型(MLLM)的细粒度图像-文本对齐能力。现有的基准测试常常忽略真实文档中交错图像-文本上下文的复杂性。COHERENCE通过评估MLLM在混合媒体环境中连接视觉和文本信息的能力来弥补这一不足,涵盖四个不同领域,包含超过6000个问题。 AI

影响 为多模态模型提供了一个新的评估框架,突出了当前在理解交错图像-文本数据方面的局限性。

排序理由 这是一篇介绍AI模型评估新基准的研究论文。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

COHERENCE基准测试评估MLLM在交错上下文中的细粒度图像-文本对齐能力

报道来源 [2]

  1. arXiv cs.AI TIER_1 English(EN) · Bingli Wang, Huanze Tang, Haijun Lv, Zhishan Lin, Lixin Gu, Lei Feng, Qipeng Guo, Kai Chen ·

    COHERENCE: Benchmarking Fine-Grained Image-Text Alignment in Interleaved Multimodal Contexts

    arXiv:2604.27389v1 Announce Type: cross Abstract: In recent years, Multimodal Large Language Models (MLLMs) have achieved remarkable progress on a wide range of multimodal benchmarks. Despite these advances, most existing benchmarks mainly focus on single-image or multi-image com…

  2. arXiv cs.CV TIER_1 English(EN) · Kai Chen ·

    COHERENCE: Benchmarking Fine-Grained Image-Text Alignment in Interleaved Multimodal Contexts

    In recent years, Multimodal Large Language Models (MLLMs) have achieved remarkable progress on a wide range of multimodal benchmarks. Despite these advances, most existing benchmarks mainly focus on single-image or multi-image comprehension. In real-world scenarios such as docume…