MergeTok unifies visual tokenization for image generation

By PulseAugur Editorial · [1 sources] · 2026-06-01 04:00

Researchers have developed MergeTok, a novel visual tokenizer that unifies continuous and discrete approaches for image generation. This method uses token merging to bridge the gap between VAEs and VQ models, enabling better semantic control and more stable training. MergeTok demonstrates competitive performance on image generation tasks with lower reconstruction error compared to existing models, offering a single architecture for robust semantic organization and generator-friendly discreteness. AI

IMPACT Introduces a unified approach to visual tokenization, potentially improving image generation quality and control.

RANK_REASON Academic paper introducing a new method for visual tokenization. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Luyuan Zhang, Siyuan Li, Zedong Wang, Qingsong Xie, Cheng Tan, Anna Wang, Yanhao Zhang, Chen Chen, Haonan Lu, Haoqian Wang · 2026-06-01 04:00

MergeTok: Unified Continuous and Discrete Visual Tokenization via Token Merging

arXiv:2605.30904v1 Announce Type: new Abstract: Most visual tokenizers for image generation are bifurcated into two families with complementary limitations: continuous VAEs offer high-fidelity reconstruction but suffer from dense, entangled latents that are poorly suited for sema…

COVERAGE [1]

MergeTok: Unified Continuous and Discrete Visual Tokenization via Token Merging

RELATED ENTITIES

RELATED TOPICS