New Benchmark Evaluates AI Creative Writing Skills

By PulseAugur Editorial · [1 sources] · 2026-05-26 17:28

A new benchmark for creative writing, focusing on short stories, has been released. The benchmark evaluates models based on head-to-head comparisons of stories generated in response to specific creative prompts. Early results show Baidu's Ernie 5.1 performing best among the tested models, with Qwen 3.7 Max, Mistral Medium 3.5, and Grok 4.3 scoring significantly lower. AI

IMPACT This benchmark could drive improvements in AI's creative writing capabilities and highlight areas for future model development.

RANK_REASON The cluster describes a new benchmark for evaluating AI models on a specific creative task. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/singularity →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New Benchmark Evaluates AI Creative Writing Skills

COVERAGE [1]

r/singularity TIER_2 English(EN) · /u/zero0_one1 · 2026-05-26 17:28

Short Story Creative Writing Benchmark. Baidu Ernie 5.1: -0.35, Qwen 3.7 Max: -2.01, Mistral Medium 3.5: -2.13, Grok 4.3: -3.81.

<table> <tr><td> <a href="https://www.reddit.com/r/singularity/comments/1todw2r/short_story_creative_writing_benchmark_baidu/"> <img alt="Short Story Creative Writing Benchmark. Baidu Ernie 5.1: -0.35, Qwen 3.7 Max: -2.01, Mistral Medium 3.5: -2.13, Grok 4.3: -3.81." src="https:/…

COVERAGE [1]

Short Story Creative Writing Benchmark. Baidu Ernie 5.1: -0.35, Qwen 3.7 Max: -2.01, Mistral Medium 3.5: -2.13, Grok 4.3: -3.81.

RELATED ENTITIES

RELATED TOPICS