English(EN) A Comprehensive Survey of Knowledge-Based Vision Question Answering Systems: The Lifecycle of Knowledge in Visual Reasoning Task

新综述梳理视觉推理和KB-VQA的进展

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-16 04:00

两篇新的arXiv综述全面概述了计算机视觉中的视觉推理任务。第一篇论文详细介绍了知识驱动的视觉问答（KB-VQA）系统，按知识表示、检索和推理对其进行分类，并强调了大型语言模型（LLMs）对该领域的影响。第二篇综述提供了视觉推理的分类，将其分为五种类型：关系型、符号型、时间型、因果型和常识型，并考察了包括LLMs和多模态大型语言模型（MLLMs）在内的各种方法。两篇论文都指出了持续存在的挑战，并为推进这些AI能力勾勒了未来的研究方向。 AI

影响这些综述整合了当前的研究，确定了关键挑战，并为视觉推理和知识驱动的VQA系统提出了未来的方向。

排序理由两篇在arXiv上发表的学术论文对特定的AI研究领域进行了全面的综述。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CV TIER_1 English(EN) · Jiaqi Deng, Zonghan Wu, Huan Huo, Guandong Xu · 2026-06-16 04:00

A Comprehensive Survey of Knowledge-Based Vision Question Answering Systems: The Lifecycle of Knowledge in Visual Reasoning Task

arXiv:2504.17547v2 Announce Type: replace Abstract: Knowledge-based Vision Question Answering (KB-VQA) extends general Vision Question Answering (VQA) by not only requiring the understanding of visual and textual inputs but also extensive range of knowledge, enabling significant …
arXiv cs.CV TIER_1 English(EN) · Ayushman Sarkar, Zhenyu Yu, Mohd Yamani Idna Idris · 2026-06-16 04:00

Reasoning in Computer Vision: Taxonomy, Models, Tasks, and Methodologies

arXiv:2508.10523v2 Announce Type: replace Abstract: Visual reasoning matters for many computer vision tasks that go beyond surface-level object detection and classification. Despite progress in relational, symbolic, temporal, causal, and commonsense reasoning, existing surveys ty…

报道来源 [2]

A Comprehensive Survey of Knowledge-Based Vision Question Answering Systems: The Lifecycle of Knowledge in Visual Reasoning Task

Reasoning in Computer Vision: Taxonomy, Models, Tasks, and Methodologies

相关实体

相关话题