PulseAugur
实时 11:16:42
English(EN) Detecting Clinical Hallucinations in LVLMs via Counterfactual Visual Grounding Uncertainty

新方法解决大型视觉语言模型中的幻觉问题 · 跟踪 4 个来源

研究人员开发了几种新方法来对抗大型视觉语言模型 (LVLM) 中的幻觉,当这些模型生成的文本不受输入图像支持时就会发生幻觉。一种称为“视觉可追踪幻觉检测”的方法,使用视觉证据基础和反事实扰动来识别不受支持的文本声明。另一个框架 ViPSy,通过关注重复的对象级内容并根据视觉线索进行条件展开来合成偏好数据,以提高忠实度。此外,一种称为定向拾取偏好优化 (OPPO) 的方法,根据视觉证据的强度而不是仅仅响应质量来学习偏好,使用有序证据边距来增强视觉敏感性。最后,上下文感知注意力干预 (CAI) 是一种无需训练的机制,它选择性地干预注意力过程,仅在必要时加强视觉基础以保持语言流畅性。 AI

影响 这些进展可以显著提高 LVLM 在医疗保健等关键应用中的可靠性和可信度。

排序理由 多篇研究论文提出缓解大型视觉语言模型幻觉的新方法。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。 我们如何撰写摘要 →

新方法解决大型视觉语言模型中的幻觉问题 · 跟踪 4 个来源

报道来源 [4]

  1. arXiv cs.CL TIER_1 English(EN) · Xiao Song, Haonan Qin, Zhaoxu Zhang, Jiong Zhang, Yuqi Fang, Caifeng Shan ·

    Detecting Clinical Hallucinations in LVLMs via Counterfactual Visual Grounding Uncertainty

    arXiv:2606.28520v1 Announce Type: cross Abstract: Large vision-language models (LVLMs) are increasingly used for clinical image understanding, yet they remain vulnerable to \emph{hallucinations}--producing textual findings or attributes not supported by the image. We present a vi…

  2. arXiv cs.LG TIER_1 English(EN) · Yunhun Nam, Jongheon Jeong ·

    Vision-driven Preference Synthesis for Mitigating Hallucinations in VLMs

    arXiv:2606.28401v1 Announce Type: cross Abstract: Vision-Language Models (VLMs) have shown strong performance in visual understanding, yet they still suffer from hallucinations, generating content that is not grounded in the image. Preference alignment is a promising approach to …

  3. arXiv cs.CV TIER_1 English(EN) · Xin Zou, Haolin Deng, Yibo Yan, Shuliang Liu, Zhiwei Jin, Chen Chen, Haonan Lu, Xuming Hu ·

    Clearer Sight, Fewer Lies: Oriented Pickup Preference Optimization for Multimodal Hallucination Mitigation

    arXiv:2606.29805v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) are prone to hallucination as their generation preferences are insufficiently calibrated to visual evidence, causing them to fall back on linguistic priors, rather than faithful grounding. In…

  4. arXiv cs.CV TIER_1 English(EN) · Yuqing Lei, Wenbo Lyu, Yingjun Du, Xiantong Zhen, Cees G. M. Snoek, Ling Shao ·

    See Only When Needed: Context-Aware Attention Intervention for Mitigating Hallucinations in LVLMs

    arXiv:2606.29847v1 Announce Type: new Abstract: Large Vision-Language Models (LVLMs) excel at multimodal tasks but remain prone to object hallucinations. Prior training-free remedies often uniformly strengthen visual signals, which may also amplify irrelevant regions and introduc…