PulseAugur
EN
LIVE 15:44:44

New CHRONOSIGHT Benchmark Reveals VLM 'Chronological Blindness'

Researchers have introduced CHRONOSIGHT, a new benchmark designed to evaluate the temporal reasoning capabilities of vision-language models (VLMs). The benchmark assesses five key areas: chronological ordering, stage localization, time elapsed estimation, detection of reversed sequences, and identification of temporal outliers. Human performance on CHRONOSIGHT averages 0.89, while the best-performing open-source VLM, Qwen2.5-VL-7B, achieved only 0.40, highlighting a significant gap termed 'chronological blindness'. Fine-tuning with LoRA on a small dataset improved performance on specific tasks, suggesting that instruction following may be a bottleneck. AI

IMPACT Highlights a significant gap in VLM temporal reasoning, suggesting areas for future model development and fine-tuning.

RANK_REASON The cluster describes a new academic paper introducing a benchmark for evaluating AI models.

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New CHRONOSIGHT Benchmark Reveals VLM 'Chronological Blindness'

COVERAGE [2]

  1. arXiv cs.CV TIER_1 English(EN) · Parthaw Goswami, Jaynto Goswami Deep ·

    Chronological Blindness: Benchmarking Temporal Reasoning in Vision-Language Models with CHRONOSIGHT

    arXiv:2606.16334v1 Announce Type: new Abstract: Human perception of visual scenes is inherently temporal. We instinctively recognise whether a fruit is ripening or rotting, whether construction is progressing or being demolished, and approximately how much time separates two phot…

  2. arXiv cs.CV TIER_1 English(EN) · Jaynto Goswami Deep ·

    Chronological Blindness: Benchmarking Temporal Reasoning in Vision-Language Models with CHRONOSIGHT

    Human perception of visual scenes is inherently temporal. We instinctively recognise whether a fruit is ripening or rotting, whether construction is progressing or being demolished, and approximately how much time separates two photographs of the same subject. Whether large visio…