English(EN) FBHM: Functional Benchmarking and Steering of VLMs for Hateful Meme Detection

新的FBHM基准测试揭示VLM在仇恨表情包检测方面的弱点

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-29 14:27

研究人员开发了一个名为FBHM的新基准测试，以更好地评估视觉语言模型（VLM）在检测仇恨表情包方面的能力。现有的基准测试常常将修辞策略与目标社区特征混淆，阻碍了对VLM漏洞的因果分析。FBHM包含25种功能和10个目标社区的5000个表情包，揭示了当前最先进的VLM在此新数据集上的表现不佳，表明它们依赖于特定数据集的启发式方法，而非强大的多模态推理。为解决此问题，研究人员提出了LSV，一种使用少量（低至500个样本）数据的策略，可显著提高VLM在FBHM上的性能。 AI

影响该基准测试有望推动更强大的多模态推理在AI中的发展，提高安全性并减少有害内容的生成。

排序理由该集群包含一篇研究论文，介绍了一种用于评估AI模型的新基准测试和方法论。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Paramananda Bhaskar, Naquee Rizwan, Daksh Jogchand, Saurabh Kumar Pandey, Animesh Mukherjee · 2026-06-01 04:00

FBHM: Functional Benchmarking and Steering of VLMs for Hateful Meme Detection

arXiv:2605.31349v1 Announce Type: cross Abstract: Hateful meme detection remains a formidable challenge for vision-language models, as existing benchmarks are structurally observational - confounding rhetorical hate mechanisms with target community features and preventing causal …
arXiv cs.AI TIER_1 English(EN) · Animesh Mukherjee · 2026-05-29 14:27

FBHM：用于仇恨表情包检测的视觉语言模型的函数式基准测试与引导

Hateful meme detection remains a formidable challenge for vision-language models, as existing benchmarks are structurally observational - confounding rhetorical hate mechanisms with target community features and preventing causal evaluation of model vulnerabilities. To address th…

报道来源 [2]

FBHM: Functional Benchmarking and Steering of VLMs for Hateful Meme Detection

FBHM：用于仇恨表情包检测的视觉语言模型的函数式基准测试与引导

相关实体

相关话题