English(EN) Enhancing Video Representations with Spatiotemporal-Semantic Residual to Mitigate Hallucinations in Video Large Multimodal Models

新方法ViSSRes减少视频模型幻觉

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-08 04:00

研究人员开发了一种名为ViSSRes的新方法，用于减少视频大模型中的幻觉。该技术通过一个轻量级网络增强视频表示，该网络考虑了时空一致性和语义关联。ViSSRes在推理时运行，不会显著增加延迟，并在基准数据集上证明了幻觉率的大幅降低。 AI

影响降低视频理解模型的幻觉率，提高AI应用的可靠性。

排序理由该集群包含一篇研究论文，详细介绍了一种改进视频大模型的新方法。[lever_c_demoted from research: ic=1 ai=1.0]

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Yuansheng Gao, Jinman Zhao, Tong Zhang, Xingguo Xu, Wenbin Xing, Han Bao, Zonghui Wang, Wenzhi Chen · 2026-06-08 04:00

利用时空语义残差增强视频表示以减轻视频大模型中的幻觉

arXiv:2601.22574v2 Announce Type: replace-cross Abstract: Although Video Large Multimodal Models have achieved strong performance in video understanding, they still suffer from hallucination. Existing inference-time intervention methods usually modify videos under the contrastive…