Brief · PulseAugur

TOOL · arXiv cs.CV English(EN) · 16h

TextAlign: Preference Alignment for Text Rendering with Hierarchical Rewards

Researchers have developed TextAlign, a new framework designed to improve the text rendering capabilities of large text-to-image generative models. This method treats text rendering as a post-training preference alignment problem, avoiding architectural changes to the base models. TextAlign utilizes a hierarchical reward system based on a vision-language model to identify and penalize rendering errors at global, word, and glyph levels, thereby enhancing OCR accuracy without compromising overall image quality. AI

IMPACT Enhances text rendering in generative models, potentially improving usability for applications requiring accurate text generation within images.

FLUX.1-dev
Qwen-Image
Z-Image-Turbo
Jingpu Yang
TextAlign
SD3.5
AnyText
TextDiffuser