New MuseBench benchmark reveals MLLMs lack deep artistic understanding

By PulseAugur Editorial · [1 sources] · 2026-06-30 04:00

Researchers have introduced MuseBench, a new benchmark designed to evaluate the artistic understanding capabilities of multimodal large language models (MLLMs). The benchmark features over 4,000 questions across various audiovisual art forms, including cinema, visual arts, and game design, focusing on the reasoning behind creative choices rather than just recognition. Current state-of-the-art MLLMs show a significant gap in this area, with the best-performing model achieving only 48.29% accuracy compared to human experts at 87.18%. AI

IMPACT Highlights a critical gap in MLLMs' ability to understand artistic intent, suggesting future research directions for more nuanced AI capabilities.

RANK_REASON New academic paper introducing a benchmark for evaluating MLLMs on artistic understanding. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New MuseBench benchmark reveals MLLMs lack deep artistic understanding

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Yuxuan Fan, Gyusik Seo, Jing Hao, Jaemin Cho, Mohit Bansal, Jaehong Yoon · 2026-06-30 04:00

MuseBench: Benchmarking Intent-Level Audiovisual Arts Understanding in MLLMs

arXiv:2606.30026v1 Announce Type: cross Abstract: Audiovisual arts encompass diverse creative disciplines, including cinema, visual arts, stage performance, and game design, where artistic meaning arises from deliberate combinations of visual, auditory, and narrative elements (e.…

COVERAGE [1]

MuseBench: Benchmarking Intent-Level Audiovisual Arts Understanding in MLLMs

RELATED ENTITIES

RELATED TOPICS