Brief · PulseAugur

RESEARCH · Hugging Face Daily Papers English(EN) · 6d · [12 sources]

Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models

Researchers have introduced Lens, a 3.8B-parameter text-to-image model that achieves competitive performance with significantly less training compute than larger models, using dense caption datasets and efficient architecture. It generates high-resolution images quickly and supports multilingual prompts. Separately, a new framework called RankE has been developed for discrete text-to-image models, which jointly optimizes the generator and decoder to improve both alignment and image fidelity, addressing issues of latent covariate shift. AI

IMPACT Lens demonstrates a path to more efficient training of large text-to-image models, while RankE offers a novel approach to improving the quality of discrete generation models.

MS-COCO 30K
LlamaGen-XL
RankE
GPT-4
Lens
Z-Image
NVIDIA H100 GPU
GPT-4.1
arXiv
Hugging Face