New GRASP dataset enhances AI's social reasoning in videos

By PulseAugur Editorial · [1 source] · 2026-05-15 09:24

Researchers have introduced GRASP, a new dataset and benchmark designed to improve multimodal large language models' (MLLMs) ability to understand social interactions in videos. GRASP connects high-level social question-answering with detailed analysis of gaze and deictic gestures across nearly 50,000 videos. The dataset also includes a novel learning signal, Social Grounding Reward (SGR), which uses these fine-grained events to train models to better identify participants in social interactions. AI

IMPACT Enhances AI's ability to interpret complex social dynamics in videos, potentially improving applications in human-computer interaction and video analysis.

RANK_REASON The cluster describes a new academic paper introducing a dataset and benchmark for AI research. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.CV TIER_1 · James M. Rehg · 2026-05-15 09:24

GRASP: Learning to Ground Social Reasoning in Multi-Person Non-Verbal Interactions

Understanding social interactions requires reasoning over subtle non-verbal cues, yet current multimodal large language models (MLLMs) often fail to identify who interacts with whom in multi-person videos. We introduce GRASP, a large-scale social reasoning dataset that connects h…

COVERAGE [1]

GRASP: Learning to Ground Social Reasoning in Multi-Person Non-Verbal Interactions

RELATED ENTITIES

RELATED TOPICS