Brief · PulseAugur

TOOL · arXiv cs.CV English(EN) · 10h

Shape of Thought: Progressive Object Assembly via Visual Chain-of-Thought

Researchers have introduced Shape-of-Thought (SoT), a novel visual Chain-of-Thought framework designed to improve the compositional structure in text-to-image generation. This framework trains a multimodal autoregressive model to produce interleaved textual plans and intermediate visual states, enabling better handling of challenges like attribute binding and part-level relations without requiring explicit geometric representations. To support SoT, a new dataset called SoT-26K and a benchmark named T2S-CompBench have been developed. Fine-tuning with SoT-26K has shown significant improvements in component numeracy and structural topology compared to direct generation methods. AI

IMPACT Enhances compositional control in text-to-image models, potentially leading to more accurate and structured visual outputs.

Hugging Face
arXiv
DagsHub
SoT
alphaXiv
ScienceCast
CatalyzeX
Gotit.pub
Shape of Thought
SoT-26K
T2S-CompBench
Yu Huo