WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata
Researchers have introduced several new benchmarks and methods for Visual Question Answering (VQA) systems. HyLoVQA proposes a dynamic hypernetwork-generated low-rank adaptation technique for continual VQA, improving adaptation to new tasks and objects. WikiVQABench offers a knowledge-grounded VQA benchmark using Wikipedia and Wikidata, designed to test models requiring external knowledge. Additionally, UCSF-PDGM-VQA focuses on brain tumor MRI interpretation, highlighting current VLM limitations in clinical settings, while RoboSurg-VQA addresses surgical segmentation-aware VQA, and VISTAQA benchmarks joint answer correctness and pixel-level evidence grounding. AI
IMPACT These new benchmarks and adaptation techniques aim to improve the reliability and capabilities of Vision-Language Models in complex, real-world scenarios.