Researchers have introduced MuseBench, a new benchmark designed to evaluate the artistic understanding capabilities of multimodal large language models (MLLMs). The benchmark features over 4,000 questions across various audiovisual art forms, including cinema, visual arts, and game design, focusing on the reasoning behind creative choices rather than just recognition. Current state-of-the-art MLLMs show a significant gap in this area, with the best-performing model achieving only 48.29% accuracy compared to human experts at 87.18%. AI
IMPACT Highlights a critical gap in MLLMs' ability to understand artistic intent, suggesting future research directions for more nuanced AI capabilities.
RANK_REASON New academic paper introducing a benchmark for evaluating MLLMs on artistic understanding. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →