Brief · PulseAugur

TOOL · arXiv cs.CV English(EN) · 7h

GroupToM-Bench: Benchmarking Group Theory of Mind and Nonlinear Social Emergence in MLLMs

Researchers have introduced GroupToM-Bench, a novel benchmark designed to evaluate the group-level Theory of Mind (ToM) capabilities of multimodal large language models. The benchmark addresses the limitation of current models that excel at individual ToM but struggle with inferring group outcomes from complex social dynamics. GroupToM-Bench assesses how models process social structures and non-linear collective behaviors, revealing a significant gap between AI performance and human baselines in predicting group-level results. AI

IMPACT This benchmark will drive research into AI's ability to understand and predict complex social interactions, crucial for developing more sophisticated AI agents.

multimodal large language models
GroupToM-Bench