Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 5d · [8 sources]

WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata

Researchers have introduced several new benchmarks and methods for Visual Question Answering (VQA) systems. HyLoVQA proposes a dynamic hypernetwork-generated low-rank adaptation technique for continual VQA, improving adaptation to new tasks and objects. WikiVQABench offers a knowledge-grounded VQA benchmark using Wikipedia and Wikidata, designed to test models requiring external knowledge. Additionally, UCSF-PDGM-VQA focuses on brain tumor MRI interpretation, highlighting current VLM limitations in clinical settings, while RoboSurg-VQA addresses surgical segmentation-aware VQA, and VISTAQA benchmarks joint answer correctness and pixel-level evidence grounding. AI

IMPACT These new benchmarks and adaptation techniques aim to improve the reliability and capabilities of Vision-Language Models in complex, real-world scenarios.

VISTAQA
GROVE
Mozhgan Nasr Azadani
Visual Question Answering
Wikidata
Wikipedia
Large Language Models
Multimodal Large Language Models
WikiVQABench
HyLoVQA
Vision-Language Models
UCSF-PDGM-VQA
RoboSurg-VQA
Low-Rank Adaptation