MergeTok: Unified Continuous and Discrete Visual Tokenization via Token Merging
Researchers have developed MergeTok, a novel visual tokenizer that unifies continuous and discrete approaches for image generation. This method uses token merging to bridge the gap between VAEs and VQ models, enabling better semantic control and more stable training. MergeTok demonstrates competitive performance on image generation tasks with lower reconstruction error compared to existing models, offering a single architecture for robust semantic organization and generator-friendly discreteness. AI
IMPACT Introduces a unified approach to visual tokenization, potentially improving image generation quality and control.