New GRASP dataset enhances AI's social reasoning in videos

作者 PulseAugur 编辑部 · [1 source] · 2026-05-15 09:24

Researchers have introduced GRASP, a new dataset and benchmark designed to improve multimodal large language models' (MLLMs) ability to understand social interactions in videos. GRASP connects high-level social question-answering with detailed analysis of gaze and deictic gestures across nearly 50,000 videos. The dataset also includes a novel learning signal, Social Grounding Reward (SGR), which uses these fine-grained events to train models to better identify participants in social interactions. AI

影响 Enhances AI's ability to interpret complex social dynamics in videos, potentially improving applications in human-computer interaction and video analysis.

排序理由 The cluster describes a new academic paper introducing a dataset and benchmark for AI research. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 · James M. Rehg · 2026-05-15 09:24

GRASP: Learning to Ground Social Reasoning in Multi-Person Non-Verbal Interactions

Understanding social interactions requires reasoning over subtle non-verbal cues, yet current multimodal large language models (MLLMs) often fail to identify who interacts with whom in multi-person videos. We introduce GRASP, a large-scale social reasoning dataset that connects h…

报道来源 [1]

GRASP: Learning to Ground Social Reasoning in Multi-Person Non-Verbal Interactions

相关实体

相关话题