PulseAugur
EN
LIVE 20:32:40

New GRASP dataset enhances AI's social reasoning in videos

Researchers have introduced GRASP, a new dataset and benchmark designed to improve multimodal large language models' (MLLMs) ability to understand social interactions in videos. GRASP connects high-level social question-answering with detailed analysis of gaze and deictic gestures across nearly 50,000 videos. The dataset also includes a novel learning signal, Social Grounding Reward (SGR), which uses these fine-grained events to train models to better identify participants in social interactions. AI

IMPACT Enhances AI's ability to interpret complex social dynamics in videos, potentially improving applications in human-computer interaction and video analysis.

RANK_REASON The cluster describes a new academic paper introducing a dataset and benchmark for AI research. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New GRASP dataset enhances AI's social reasoning in videos

COVERAGE [1]

  1. arXiv cs.CV TIER_1 · James M. Rehg ·

    GRASP: Learning to Ground Social Reasoning in Multi-Person Non-Verbal Interactions

    Understanding social interactions requires reasoning over subtle non-verbal cues, yet current multimodal large language models (MLLMs) often fail to identify who interacts with whom in multi-person videos. We introduce GRASP, a large-scale social reasoning dataset that connects h…