Researchers have introduced HarmVideoBench, a new benchmark designed to evaluate the harmful video understanding capabilities of large vision-language models (LVLMs). Existing benchmarks often oversimplify harmful content as binary classification and lack explanatory rationales, leading to black-box evaluations. HarmVideoBench addresses these limitations by offering a multi-layered diagnostic approach with 1,379 videos and 4,137 multiple-choice questions, assessing models across observable evidence, clip-internal meaning, and beyond-clip reasoning. The benchmark also introduces BCR, a method that improves model performance by predicting reasoning boundaries and dynamically retrieving context, raising the average score from 61.7% to 84.4%. AI
IMPACT This benchmark could drive improvements in AI's ability to understand and moderate harmful video content, leading to safer online environments.
RANK_REASON The cluster describes a new academic benchmark for evaluating AI models, published on arXiv.
Read on Hugging Face Daily Papers →
- alphaXiv
- arXiv
- CatalyzeX
- CORE Recommender
- DagsHub
- Gotit.pub
- HarmVideoBench
- Hugging Face
- Large Multimodal Models
- Large Vision Language Models
- ScienceCast
AI-generated summary · Google Gemini · from 3 sources. How we write summaries →