TextAlign: Preference Alignment for Text Rendering with Hierarchical Rewards
Researchers have developed TextAlign, a new framework designed to improve the text rendering capabilities of large text-to-image generative models. This method treats text rendering as a post-training preference alignment problem, avoiding architectural changes to the base models. TextAlign utilizes a hierarchical reward system based on a vision-language model to identify and penalize rendering errors at global, word, and glyph levels, thereby enhancing OCR accuracy without compromising overall image quality. AI
IMPACT Enhances text rendering in generative models, potentially improving usability for applications requiring accurate text generation within images.