LLMs struggle to mimic human video engagement ratings

By PulseAugur Editorial · [1 sources] · 2026-06-09 04:00

Researchers evaluated multimodal large language models (MLLMs) as synthetic participants for assessing perceived engagement with videos. Using the Perceived Message Sensation Value (PMSV) framework, they compared human ratings with those from Gemini 3 Flash and Qwen 3 Omni simulations. The study found that even advanced MLLMs showed limited agreement with human responses, exhibiting biases like lower average ratings and a tendency towards central values. While prompting strategies had varied effects, the models struggled to replicate nuanced subgroup differences and participant profile sensitivities. AI

IMPACT Highlights limitations of current MLLMs in capturing subjective human responses, impacting their use in qualitative research.

RANK_REASON Academic paper evaluating LLM capabilities on a specific research task. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Prabal Shrestha, Bohan Jiang, Haoning Xue, Huan Liu, Xinyi Zhou · 2026-06-09 04:00

Multimodal Large Language Models as Synthetic Participants in Video-Based Studies: An Evaluation

arXiv:2606.07541v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) have shown strong performance on objective tasks such as video understanding and reasoning. However, it remains unclear whether they can approximate subjective human responses, which depend…

COVERAGE [1]

Multimodal Large Language Models as Synthetic Participants in Video-Based Studies: An Evaluation

RELATED ENTITIES

RELATED TOPICS