PulseAugur
实时 08:16:10

New framework estimates LVLM confidence by contrasting image-based predictions

Researchers have developed a new framework called BICR (Blind-Image Contrastive Ranking) to assess the confidence of Large Vision-Language Models (LVLMs). This method helps distinguish between predictions genuinely informed by visual input and those relying solely on language priors. BICR trains a lightweight probe to contrast hidden states from the LVLM with and without the image, penalizing higher confidence when the image is obscured. Evaluated on multiple LVLMs and diverse tasks, BICR demonstrated superior calibration and discrimination with significantly fewer parameters than existing baselines. AI

影响 Improves reliability of vision-language models by identifying predictions not grounded in visual input.

排序理由 Academic paper introducing a novel method for confidence estimation in LVLMs. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

New framework estimates LVLM confidence by contrasting image-based predictions

报道来源 [1]

  1. arXiv cs.CL TIER_1 English(EN) · Mohammad M. Ghassemi ·

    Grounded or Guessing? LVLM Confidence Estimation via Blind-Image Contrastive Ranking

    Large vision-language models suffer from visual ungroundedness: they can produce a fluent, confident, and even correct response driven entirely by language priors, with the image contributing nothing to the prediction. Existing confidence estimation methods cannot detect this, as…