Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models
Researchers have introduced Lens, a 3.8B-parameter text-to-image model that achieves competitive performance with significantly less training compute than larger models, using dense caption datasets and efficient architecture. It generates high-resolution images quickly and supports multilingual prompts. Separately, a new framework called RankE has been developed for discrete text-to-image models, which jointly optimizes the generator and decoder to improve both alignment and image fidelity, addressing issues of latent covariate shift. AI
IMPACT Lens demonstrates a path to more efficient training of large text-to-image models, while RankE offers a novel approach to improving the quality of discrete generation models.