PulseAugur
实时 20:32:33

New GRASP dataset enhances AI's social reasoning in videos

Researchers have introduced GRASP, a new dataset and benchmark designed to improve multimodal large language models' (MLLMs) ability to understand social interactions in videos. GRASP connects high-level social question-answering with detailed analysis of gaze and deictic gestures across nearly 50,000 videos. The dataset also includes a novel learning signal, Social Grounding Reward (SGR), which uses these fine-grained events to train models to better identify participants in social interactions. AI

影响 Enhances AI's ability to interpret complex social dynamics in videos, potentially improving applications in human-computer interaction and video analysis.

排序理由 The cluster describes a new academic paper introducing a dataset and benchmark for AI research. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

New GRASP dataset enhances AI's social reasoning in videos

报道来源 [1]

  1. arXiv cs.CV TIER_1 · James M. Rehg ·

    GRASP: Learning to Ground Social Reasoning in Multi-Person Non-Verbal Interactions

    Understanding social interactions requires reasoning over subtle non-verbal cues, yet current multimodal large language models (MLLMs) often fail to identify who interacts with whom in multi-person videos. We introduce GRASP, a large-scale social reasoning dataset that connects h…