新的基准 Arena-T2I Hard 测试复杂提示下的文本到图像保真度

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-30 14:17

研究人员推出了 Arena-T2I Hard，这是一个旨在评估文本到图像模型保真度的新基准，特别针对复杂、多方面的提示。该基准源自真实用户日志，每个提示包含约 30 个分解后的约束，解决了空间关系、风格细微差别和文本渲染等问题，这些问题常常被更简单的基准所忽略。研究发现，顶级系统在此更难的基准上仍存在显著的性能差距，并且公共平台上的审美偏好不一定与细粒度的提示遵循度相关。为了提高保真度，提出了一种依赖感知清单奖励机制，该机制将提示分解为问题的有向无环图，提供了更精细的训练信号。与更简单的奖励策略相比，这种方法结合审美奖励，在 SD3.5-Medium 和 FLUX.1-dev 等模型上展示了保真度和美学之间更好的权衡。 AI

影响该基准可以推动文本到图像模型能力的改进，从而为复杂的创意任务带来更可靠、更精确的图像生成。

排序理由该集群包含一篇学术论文，介绍了用于评估文本到图像模型的新基准和方法。 [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Yuanhao Ban, Tong Xie, Sohyun An, Yunqi Hong, Evan Frick, I-Hung Hsu, Wei-Lin Chiang, Ion Stoica, Cho-Jui Hsieh · 2026-07-01 04:00

Arena-T2I Hard: Benchmarking and Improving Faithfulness with Dependency-Aware Checklist

arXiv:2606.31711v1 Announce Type: new Abstract: Faithfulness -- how precisely a generated image aligns with its prompt -- is increasingly central to the real-world utility of text-to-image (T2I) models. Existing faithfulness benchmarks, however, rely on simple atomic instructions…
arXiv cs.AI TIER_1 English(EN) · Cho-Jui Hsieh · 2026-06-30 14:17

Arena-T2I Hard: Benchmarking and Improving Faithfulness with Dependency-Aware Checklist

Faithfulness -- how precisely a generated image aligns with its prompt -- is increasingly central to the real-world utility of text-to-image (T2I) models. Existing faithfulness benchmarks, however, rely on simple atomic instructions, on which top-tier systems already achieve near…

报道来源 [2]

Arena-T2I Hard: Benchmarking and Improving Faithfulness with Dependency-Aware Checklist

Arena-T2I Hard: Benchmarking and Improving Faithfulness with Dependency-Aware Checklist

相关实体

相关话题