Researchers have developed an end-to-end training pipeline for autoregressive image generation that jointly optimizes reconstruction and generation. This approach allows for direct supervision of the visual tokenizer from the generation results, differing from previous methods that trained tokenizers and generative models separately. The new model leverages vision foundation models to enhance 1D tokenizers and has achieved a state-of-the-art FID score of 1.48 on ImageNet 256x256 generation without guidance. AI
影响 Introduces a novel end-to-end training approach for image generation models, potentially improving efficiency and performance.
排序理由 Academic paper detailing a new method for autoregressive image generation.
AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →