Automated Creativity Evaluation of Language Models Across Open-Ended Tasks
Researchers have developed a new automated framework to evaluate the creativity of large language models (LLMs) across various open-ended tasks. This domain-agnostic approach uses semantic entropy to measure divergent creativity (novelty and diversity) and a multi-agent judge system for convergent creativity (task fulfillment). The framework was validated on LLMs in problem-solving, research ideation, and creative writing, revealing how model properties influence creative output. AI
IMPACT Establishes a reproducible standard for evaluating LLM creativity, enabling scalable benchmarking and accelerating progress in creative AI.