New benchmark evaluates human and LLM text-to-image prompting skills

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-21 15:51

Researchers have introduced AtelierEval, a novel benchmark designed to evaluate the proficiency of both humans and multimodal large language models (MLLMs) in generating effective text-to-image prompts. This benchmark, which includes 360 expert-crafted tasks, aims to quantify the quality of prompts used to translate user intent into detailed instructions for text-to-image systems. AtelierEval also features AtelierJudge, an agentic evaluator that correlates strongly with human expert assessments, and its experiments reveal that mimicry-based prompting may be more effective than planning-based approaches for future prompters. AI

影响 Introduces a new evaluation framework for text-to-image prompting, enabling better assessment of both human and AI prompter capabilities.

排序理由 The cluster contains an academic paper introducing a new benchmark and evaluation methodology for text-to-image prompting. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Hanan Salam · 2026-05-21 15:51

AtelierEval: Agentic Evaluation of Humans & LLMs as Text-to-Image Prompters

Text-to-image (T2I) systems increasingly rely on upstream prompters, either humans or multimodal large language models (MLLMs), to translate user intent into detailed prompts. Yet current benchmarks fix the prompt and only evaluate T2I models, leaving the prompting proficiency of…

报道来源 [1]

AtelierEval: Agentic Evaluation of Humans & LLMs as Text-to-Image Prompters

相关实体

相关话题