English(EN) Grounded or Guessing? LVLM Confidence Estimation via Blind-Image Contrastive Ranking

新框架通过对比图像预测来评估LVLM置信度

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-11 17:35

研究人员开发了一个名为BICR（盲图对比排名）的新框架，用于评估大型视觉语言模型（LVLM）的置信度。该方法有助于区分真正由视觉输入驱动的预测和仅依赖语言先验的预测。BICR训练一个轻量级探针，用于对比有无图像时LVLM的隐藏状态，在图像被遮挡时降低置信度。BICR在多个LVLM和各种任务上进行了评估，证明其具有优越的校准和区分能力，并且参数量远少于现有基线。 AI

影响通过识别未基于视觉输入的预测，提高了视觉语言模型的可靠性。

排序理由介绍LVLM置信度估计新方法的学术论文。 [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Mohammad M. Ghassemi · 2026-05-11 17:35

是基于事实还是猜测？通过盲图对比排序评估 LVLM 置信度

Large vision-language models suffer from visual ungroundedness: they can produce a fluent, confident, and even correct response driven entirely by language priors, with the image contributing nothing to the prediction. Existing confidence estimation methods cannot detect this, as…

报道来源 [1]

是基于事实还是猜测？通过盲图对比排序评估 LVLM 置信度

相关实体

相关话题