Brief · PulseAugur

TOOL · arXiv cs.CL English(EN) · 7h

Would you still call this Dax? Novel Visual References in VLMs and Humans

Researchers have introduced the Novel Visual References Dataset (NVRD), comprising over 19,000 images across 90 visual concepts, designed to test how vision-language models (VLMs) learn new concepts, especially when they conflict with pre-existing knowledge. Evaluations of both open- and closed-source models alongside human judgments revealed that VLMs struggle to adapt to novel concepts in-context and tend to overgeneralize learned labels to incorrect stimuli, unlike humans. The NVRD aims to serve as a benchmark for studying visual concept acquisition in both humans and machines. AI

IMPACT Establishes a new benchmark for evaluating VLM concept learning and generalization, highlighting current limitations compared to human capabilities.

vision-language models
Novel Visual References Dataset
NVRD