PulseAugur
EN
LIVE 12:07:12

HorusEye framework uses language as dynamic attention for emergency visual analysis

A new research paper introduces HorusEye, a framework designed for emergency visual analysis that treats language as dynamic attention. The study benchmarks various vision-language models (VLMs) like Gemini, Qwen2-VL, BLIP-2, LLaVA, and Kosmos-2 on a degraded dataset simulating conditions such as fog, smoke, and thermal imagery. Key findings indicate that language feedback significantly impacts model performance differently across VLMs, with Gemini showing substantial improvement in thermal conditions while Qwen2-VL degrades. The research also highlights a 'Thermal Paradox' where image cropping strategies effective for RGB fail in thermal imagery, and notes that BLIP-2 uniquely hallucinates more under degradation. AI

IMPACT Introduces a novel approach for emergency visual analysis, highlighting model-specific performance variations and challenges in degraded conditions.

RANK_REASON Research paper introducing a new framework and evaluating existing models on a novel dataset. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.LG TIER_1 English(EN) · Armel Yara ·

    HorusEye: Language as Dynamic Attention for Emergency Visual Analysis

    arXiv:2606.14741v1 Announce Type: cross Abstract: We introduce HorusEye, Language as Dynamic Attention for Emergency Visual Analysis. Our investigation followed five stages. The first one is benchmarking RefCOCO-Degraded, a dataset of 15,244 images (3,811 base images x 4 conditio…