Brief · PulseAugur

RESEARCH · arXiv cs.CL English(EN) · 3d · [3 sources]

Automated Creativity Evaluation of Language Models Across Open-Ended Tasks

Researchers have developed a new automated framework to evaluate the creativity of large language models (LLMs) across various open-ended tasks. This domain-agnostic approach uses semantic entropy to measure divergent creativity (novelty and diversity) and a multi-agent judge system for convergent creativity (task fulfillment). The framework was validated on LLMs in problem-solving, research ideation, and creative writing, revealing how model properties influence creative output. AI

IMPACT Establishes a reproducible standard for evaluating LLM creativity, enabling scalable benchmarking and accelerating progress in creative AI.

LLMs
retrieval-based multi-agent judge framework
Large language models
multi-agent judge framework
Automated Creativity Evaluation of Language Models Across Open-Ended Tasks