New SMARLI framework enhances layout control in text-to-image generation

By PulseAugur Editorial · [1 sources] · 2026-07-01 04:00

Researchers have developed SMARLI, a novel framework for layout-conditioned autoregressive text-to-image generation. This approach uses a structured masking strategy within the attention mechanism to effectively integrate spatial layout constraints with text and image tokens, preventing feature entanglement. Additionally, a Group Relative Policy Optimization (GRPO) scheme, adapted for a next-set-based paradigm and incorporating image quality and layout rewards, is employed to mitigate exposure bias and improve generation accuracy. Experiments show SMARLI enhances layout control while maintaining the efficiency of autoregressive models and can be transferred to standard next-token-based models. AI

IMPACT This research introduces a novel method for improving layout control in text-to-image generation models, potentially leading to more precise and contextually accurate image synthesis.

RANK_REASON The cluster contains an academic paper detailing a new framework for text-to-image generation. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

Zirui Zheng

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New SMARLI framework enhances layout control in text-to-image generation

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Zirui Zheng, Takashi Isobe, Tong Shen, Xu Jia, Jianbin Zhao, Xiaomin Li, Mengmeng Ge, Baolu Li, Qinghe Wang, Dong Li, Dong Zhou, Yunzhi Zhuge, Huchuan Lu, Emad Barsoum · 2026-07-01 04:00

Layout-Conditioned Autoregressive Text-to-Image Generation via Structured Masking

arXiv:2509.12046v2 Announce Type: replace-cross Abstract: Although autoregressive (AR) models have demonstrated remarkable success in image generation, extending these models to layout-conditioned generation remains challenging due to the sparse nature of layout conditions and th…

COVERAGE [1]

Layout-Conditioned Autoregressive Text-to-Image Generation via Structured Masking

RELATED TOPICS