Researchers have developed a new automated framework to evaluate the creativity of large language models (LLMs) across various open-ended tasks. This domain-agnostic approach uses semantic entropy to measure divergent creativity (novelty and diversity) and a multi-agent judge system for convergent creativity (task fulfillment). The framework was validated on LLMs in problem-solving, research ideation, and creative writing, revealing how model properties influence creative output. AI
IMPACT Establishes a reproducible standard for evaluating LLM creativity, enabling scalable benchmarking and accelerating progress in creative AI.
RANK_REASON The cluster contains an academic paper detailing a new research framework for evaluating LLM creativity.
- LLMs
- retrieval-based multi-agent judge framework
- Automated Creativity Evaluation of Language Models Across Open-Ended Tasks
- Large language models
- multi-agent judge framework
AI-generated summary · Google Gemini · from 3 sources. How we write summaries →