English(EN) Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty?

新基准显示AI模型在评判图像美感方面落后于人类专家

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-12 19:33

研究人员开发了视觉美学基准（VAB），以评估多模态大型语言模型（MLLMs）在评判图像美感方面的能力。他们的研究发现，当前前沿的MLLMs在比较美学评估方面表现明显不如人类专家。即使是测试中最强的系统，在任务中正确识别最佳和最差图像的比例仅为26.5%，而人类专家的比例为68.9%，这凸显了AI在美学判断能力上的差距。 AI

影响凸显了AI在进行细微美学判断能力上的显著差距，可能影响创意AI应用。

排序理由该集群描述了一个新的学术基准以及现有模型在该基准下的评估。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Zhangchen Xu · 2026-05-12 19:33

Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty?

Multimodal large language models (MLLMs) are now routinely deployed for visual understanding, generation, and curation. A substantial fraction of these applications require an explicit aesthetic judgment. Most existing solutions reduce this judgment to predicting a scalar score f…

报道来源 [1]

Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty?

相关实体

相关话题