新的 CHRONOSIGHT 基准揭示了 VLM 的“时间顺序盲区”

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-15 07:38

研究人员推出了 CHRONOSIGHT，这是一个旨在评估视觉语言模型 (VLM) 时间推理能力的新基准。该基准评估了五个关键领域：时间顺序排序、阶段定位、时间流逝估算、反向序列检测和时间异常识别。人类在 CHRONOSIGHT 上的平均表现为 0.89，而表现最佳的开源 VLM Qwen2.5-VL-7B 仅达到 0.40，这凸显了一个被称为“时间顺序盲区”的显著差距。使用 LoRA 在小型数据集上进行微调可以提高特定任务的表现，这表明指令遵循可能是瓶颈。 AI

影响突出了 VLM 时间推理方面的显著差距，为未来模型开发和微调指明了方向。

排序理由该集群描述了一篇介绍用于评估 AI 模型基准的新学术论文。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CV TIER_1 English(EN) · Parthaw Goswami, Jaynto Goswami Deep · 2026-06-16 04:00

Chronological Blindness: Benchmarking Temporal Reasoning in Vision-Language Models with CHRONOSIGHT

arXiv:2606.16334v1 Announce Type: new Abstract: Human perception of visual scenes is inherently temporal. We instinctively recognise whether a fruit is ripening or rotting, whether construction is progressing or being demolished, and approximately how much time separates two phot…
arXiv cs.CV TIER_1 English(EN) · Jaynto Goswami Deep · 2026-06-15 07:38

Chronological Blindness: Benchmarking Temporal Reasoning in Vision-Language Models with CHRONOSIGHT

Human perception of visual scenes is inherently temporal. We instinctively recognise whether a fruit is ripening or rotting, whether construction is progressing or being demolished, and approximately how much time separates two photographs of the same subject. Whether large visio…

报道来源 [2]

Chronological Blindness: Benchmarking Temporal Reasoning in Vision-Language Models with CHRONOSIGHT

Chronological Blindness: Benchmarking Temporal Reasoning in Vision-Language Models with CHRONOSIGHT

相关实体

相关话题