Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 2w · [2 sources]

FBHM: Functional Benchmarking and Steering of VLMs for Hateful Meme Detection

Researchers have developed a new benchmark called FBHM to better evaluate the capabilities of vision-language models (VLMs) in detecting hateful memes. Existing benchmarks often confuse rhetorical strategies with target community features, hindering causal analysis of VLM vulnerabilities. FBHM, comprising 5,000 memes across 25 functionalities and 10 target communities, reveals that current state-of-the-art VLMs perform poorly on this new dataset, indicating they rely on dataset-specific heuristics rather than robust multimodal reasoning. To address this, the researchers propose LSV, a low-data strategy using as few as 500 samples to significantly improve VLM performance on FBHM. AI

IMPACT This benchmark could drive the development of more robust multimodal reasoning in AI, improving safety and reducing harmful content generation.

vision-language models
Paramananda Bhaskar
FBHM