Researchers have developed an end-to-end training pipeline for autoregressive image generation that jointly optimizes reconstruction and generation. This approach allows for direct supervision of the visual tokenizer from the generation results, differing from previous methods that trained tokenizers and generative models separately. The new model leverages vision foundation models to enhance 1D tokenizers and has achieved a state-of-the-art FID score of 1.48 on ImageNet 256x256 generation without guidance. AI
Summary written by None from 2 sources. How we write summaries →
IMPACT Introduces a novel end-to-end training approach for image generation models, potentially improving efficiency and performance.
RANK_REASON Academic paper detailing a new method for autoregressive image generation.