English(EN) Layout-Conditioned Autoregressive Text-to-Image Generation via Structured Masking

新的 SMARLI 框架增强了文本到图像生成中的布局控制

作者 PulseAugur 编辑部 · [1 个来源] · 2026-07-01 04:00

研究人员开发了 SMARLI，一种用于布局条件自回归文本到图像生成的新框架。该方法在注意力机制中使用结构化掩码策略，将空间布局约束与文本和图像 token 有效地集成起来，防止特征纠缠。此外，还采用了一种适用于基于 next-set 范式的组相对策略优化 (GRPO) 方案，并结合了图像质量和布局奖励，以减轻曝光偏差并提高生成准确性。实验表明，SMARLI 在保持自回归模型效率的同时增强了布局控制，并且可以迁移到标准的基于 next-token 的模型。 AI

影响这项研究介绍了一种用于改进文本到图像生成模型中布局控制的新方法，有望实现更精确、更符合上下文的图像合成。

排序理由该集群包含一篇关于文本到图像生成新框架的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

Zirui Zheng

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Zirui Zheng, Takashi Isobe, Tong Shen, Xu Jia, Jianbin Zhao, Xiaomin Li, Mengmeng Ge, Baolu Li, Qinghe Wang, Dong Li, Dong Zhou, Yunzhi Zhuge, Huchuan Lu, Emad Barsoum · 2026-07-01 04:00

Layout-Conditioned Autoregressive Text-to-Image Generation via Structured Masking

arXiv:2509.12046v2 Announce Type: replace-cross Abstract: Although autoregressive (AR) models have demonstrated remarkable success in image generation, extending these models to layout-conditioned generation remains challenging due to the sparse nature of layout conditions and th…

报道来源 [1]

Layout-Conditioned Autoregressive Text-to-Image Generation via Structured Masking

相关话题