PulseAugur
实时 11:49:36
English(EN) Towards Robustness against Typographic Attack with Training-free Concept Localization

新方法增强 LVLM 对印刷攻击的鲁棒性

研究人员开发了一种新方法,以提高大型视觉语言模型(LVLM)对抗印刷攻击的鲁棒性,印刷攻击是指图像中的无关文本会误导模型。这种无需训练的方法利用了机械可解释性来识别视觉 Transformer(ViT)中负责编码词汇信息的特定组件。通过在不重新训练的情况下选择性地调整这些已识别电路中的注意力权重,该方法显著增强了在对象分类任务中对抗印刷攻击的鲁棒性,并提高了在 RIO-Bench 等基准测试上的视觉问答任务的准确性。 AI

影响 通过减轻图像中误导性文本的影响,增强了 LVLM 在安全关键应用中的可靠性。

排序理由 学术论文,详细介绍了提高模型鲁棒性的新方法。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

新方法增强 LVLM 对印刷攻击的鲁棒性

报道来源 [2]

  1. arXiv cs.CL TIER_1 English(EN) · Bohan Liu, Wenqian Ye, Guangzhi Xiong, Zhenghao He, Sanchit Sinha, Aidong Zhang ·

    Towards Robustness against Typographic Attack with Training-free Concept Localization

    arXiv:2607.02494v1 Announce Type: cross Abstract: Models trained via Contrastive Language-Image Pretraining (CLIP) serve as the foundational vision encoders for most modern Large Vision Language Models (LVLMs). Despite their widespread adoption, CLIP models exhibit a critical yet…

  2. arXiv cs.CL TIER_1 English(EN) · Aidong Zhang ·

    Towards Robustness against Typographic Attack with Training-free Concept Localization

    Models trained via Contrastive Language-Image Pretraining (CLIP) serve as the foundational vision encoders for most modern Large Vision Language Models (LVLMs). Despite their widespread adoption, CLIP models exhibit a critical yet underexplored failure mode: irrelevant text appea…