Brief · PulseAugur

TOOL · arXiv cs.CL English(EN) · 1w

When Reasoning Supervision Hurts: TTCW-Based Long-Form Literary Review Generation

Researchers have developed a new dataset containing over 260,000 long-form stories, each annotated with creativity scores and review comments based on the Torrance Test of Creative Writing (TTCW). They fine-tuned Qwen3 models on this data to generate literary reviews, finding that models trained without explicit reasoning supervision performed better. The study suggests that for structured, rubric-based review generation, reasoning supervision may not be beneficial and can even lead to irrelevant or repetitive outputs. AI

IMPACT Introduces a novel dataset and methodology for AI-driven literary review generation, potentially improving automated evaluation of creative writing.

Qwen3
Torrance Test of Creative Writing (TTCW)