A new research paper explores the ability of multimodal large language models (LLMs) to assess visual creativity without prior training. The study tested six LLMs, including Gemini 3 Flash, Gemma 4-31B-it, and GPT-5.4 Mini, on AI-generated images and human sketches. Results showed that these models could align with human creativity ratings, with correlations ranging from .29 to .68. While the LLMs' step-by-step reasoning processes offered interpretability into their evaluation criteria, such as balancing originality and quality, this reasoning did not enhance their alignment with human judgments. AI
IMPACT Multimodal LLMs demonstrate potential for zero-shot visual creativity assessment, offering interpretable reasoning for AI-generated art and sketches.
RANK_REASON Academic paper detailing research findings on LLM capabilities.
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →