New GRACE framework boosts video MLLMs for sentiment prediction

By PulseAugur Editorial · [1 sources] · 2026-06-16 04:00

Researchers have developed GRACE, a new framework designed to improve the performance of Multimodal Large Language Models (MLLMs) in predicting viewer sentiment for video advertisements. GRACE addresses the limitations of current MLLMs by extracting structured, action-centric evidence, including subject-verb-object triplets and localized visual crops of participating entities. This approach allows MLLMs to perform more precise emotional reasoning by grounding clues in specific visual elements and temporal sequences. Experiments on the Pitts dataset demonstrated that GRACE significantly enhances performance compared to baseline models like Qwen2.5-VL and Qwen3-VL, with further validation on AdsQA and TVQA datasets. AI

RANK_REASON The cluster contains a research paper detailing a new framework and experimental results. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Ruoxuan Yang, Tieyuan Chen, Xiaofeng Huang, Haibing Yin, Jun Wang, Xiping Chen, Jun Yin, Xuesong Gao, Weiyao Lin · 2026-06-16 04:00

GRACE: Boosting Video MLLMs with Grounded Action-Centric Evidence for Viewer Sentiment Prediction

arXiv:2606.16198v1 Announce Type: new Abstract: Viewer sentiment prediction in video advertisements aims to infer the latent affective response evoked in the audience. To bridge the gap between what is shown and what is felt, models must deduce hidden viewer emotions from explici…

COVERAGE [1]

GRACE: Boosting Video MLLMs with Grounded Action-Centric Evidence for Viewer Sentiment Prediction

RELATED ENTITIES

RELATED TOPICS