PulseAugur
实时 22:12:01

New benchmark reveals VLMs struggle with high-res Earth observation details

Researchers have introduced UHR-Micro, a new benchmark designed to evaluate Vision-Language Models (VLMs) on their ability to perceive small, critical details within ultra-high-resolution Earth observation imagery. Current VLMs often suffer from a "resolution illusion," where high input resolution doesn't translate to reliable perception of micro-scale targets. The benchmark, comprising over 11,000 instructions and 1,200 images, reveals significant failures in spatial grounding and evidence parsing by existing models. To address this, the team developed the Micro-evidence Active Perception (MAP) agent, which improves perception by focusing reasoning on localized observations rather than the entire high-resolution image. AI

影响 Highlights limitations in current VLMs for critical micro-detail perception in high-resolution imagery, driving research into more evidence-centered reasoning agents.

排序理由 The cluster describes a new academic paper introducing a benchmark and a proposed agent for evaluating AI models. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

New benchmark reveals VLMs struggle with high-res Earth observation details

报道来源 [1]

  1. arXiv cs.CV TIER_1 English(EN) · Bo Du ·

    UHR-Micro: Diagnosing and Mitigating the Resolution Illusion in Earth Observation VLMs

    Vision-Language Models (VLMs) increasingly operate on ultra-high-resolution (UHR) Earth observation imagery, yet they remain vulnerable to a severe scale mismatch between large-scale scene context and micro-scale targets. We refer to this empirical gap as a "resolution illusion":…