Brief · PulseAugur

TOOL · arXiv cs.CL English(EN) · 1d

ValueGround: Evaluating Culture-Conditioned Visual Value Grounding in MLLMs

Researchers have developed a new benchmark called ValueGround to assess how well multimodal large language models (MLLMs) understand and apply cultural values when presented with visual information. The benchmark, derived from World Values Survey questions, uses pairs of images to represent different value tendencies, requiring models to select the image aligning with a specific country's values without textual cues. Experiments revealed a significant drop in model performance when visual options replaced text, with average accuracy decreasing from 72.8% to 62.6%, highlighting challenges in cross-modal cultural understanding. AI

IMPACT Highlights challenges in cross-modal cultural understanding for MLLMs, potentially guiding future model development and evaluation.

MLLMs
World Values Survey
ValueGround
Zhipin Wang