Researchers have developed GRACE, a new framework designed to improve the performance of Multimodal Large Language Models (MLLMs) in predicting viewer sentiment for video advertisements. GRACE addresses the limitations of current MLLMs by extracting structured, action-centric evidence, including subject-verb-object triplets and localized visual crops of participating entities. This approach allows MLLMs to perform more precise emotional reasoning by grounding clues in specific visual elements and temporal sequences. Experiments on the Pitts dataset demonstrated that GRACE significantly enhances performance compared to baseline models like Qwen2.5-VL and Qwen3-VL, with further validation on AdsQA and TVQA datasets. AI
RANK_REASON The cluster contains a research paper detailing a new framework and experimental results. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →