Researchers evaluated multimodal large language models (MLLMs) as synthetic participants for assessing perceived engagement with videos. Using the Perceived Message Sensation Value (PMSV) framework, they compared human ratings with those from Gemini 3 Flash and Qwen 3 Omni simulations. The study found that even advanced MLLMs showed limited agreement with human responses, exhibiting biases like lower average ratings and a tendency towards central values. While prompting strategies had varied effects, the models struggled to replicate nuanced subgroup differences and participant profile sensitivities. AI
IMPACT Highlights limitations of current MLLMs in capturing subjective human responses, impacting their use in qualitative research.
RANK_REASON Academic paper evaluating LLM capabilities on a specific research task. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →