PulseAugur
实时 08:53:42
English(EN) HarmVideoBench: Benchmarking Harmful Video Understanding in Large Multimodal Models

新的HarmVideoBench评估大型语言模型对细微有害视频的理解能力 · 跟踪2个来源

研究人员推出了HarmVideoBench,这是一个旨在评估大型视觉语言模型(LVLMs)有害视频理解能力的新基准。现有的基准通常将有害内容过度简化为二元分类,并且缺乏解释性理由,导致评估结果不透明。HarmVideoBench通过提供一个多层次的诊断方法,包含1,379个视频和4,137个多项选择题,来评估模型在可观察证据、剪辑内部含义和剪辑外推理方面的能力,从而解决了这些局限性。该基准还引入了BCR方法,通过预测推理边界并动态检索上下文来提高模型性能,将平均得分从61.7%提高到84.4%。 AI

影响 该基准有望推动AI在理解和审核有害视频内容方面的能力提升,从而带来更安全的在线环境。

排序理由 该集群描述了一个用于评估AI模型的新学术基准,已在arXiv上发布。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

新的HarmVideoBench评估大型语言模型对细微有害视频的理解能力 · 跟踪2个来源

报道来源 [3]

  1. arXiv cs.CL TIER_1 English(EN) · Jiajun Wu, Haoyu Kang, Yining Sun, Jiacheng Hou, Heng Zhang, Danyang Zhang, Zhenjun Zhao, Haochi Zhang, Leixin Sun, Eric Hanchen Jiang, Yushan Li, Ruiyu Li, Mengkai Huang, Yan Gao, Xu Zhang, Guancheng Wan ·

    HarmVideoBench:大型多模态模型有害视频理解基准测试

    arXiv:2606.27187v1 Announce Type: cross Abstract: Large vision-language models (LVLMs) have recently shown immense potential in automated content moderation, sparking growing interest in developing harmful-video benchmarks. However, we identify two primary limitations in existing…

  2. Hugging Face Daily Papers TIER_1 English(EN) ·

    HarmVideoBench:大型多模态模型有害视频理解基准测试

    Large vision-language models (LVLMs) have recently shown immense potential in automated content moderation, sparking growing interest in developing harmful-video benchmarks. However, we identify two primary limitations in existing works: 1) The multi-layered characteristics of ha…

  3. arXiv cs.CV TIER_1 English(EN) · Guancheng Wan ·

    HarmVideoBench:大型多模态模型有害视频理解基准测试

    Large vision-language models (LVLMs) have recently shown immense potential in automated content moderation, sparking growing interest in developing harmful-video benchmarks. However, we identify two primary limitations in existing works: 1) The multi-layered characteristics of ha…