Distilling Drifting Transformers with Representation Autoencoders
Researchers have developed a new method called Drift-RAE to improve the distillation process for representation autoencoders (RAEs). This technique addresses issues of anisotropy and large curvatures in RAE latent spaces that previously hindered training stability. By applying the drifting paradigm to RAEs and incorporating modifications for training stability, Drift-RAE achieves competitive results on the ImageNet 256 dataset with significantly fewer distillation steps compared to existing methods. AI
IMPACT This research could lead to more efficient training of generative models by improving distillation techniques for representation autoencoders.