Why Are DMD Students Lazy? Understanding the Copying Behavior in Few-Step Distillation
Researchers have identified a phenomenon called 'copying' in high-dimensional distillation of diffusion models. This occurs when a distilled student model replicates the original noise-data pairings of the teacher model, a behavior not observed in lower-dimensional settings. The study suggests this copying is an emergent property due to the student model's limited geometric freedom during distillation, rather than adversarial objectives or teacher memorization. AI
IMPACT Identifies a new behavior in diffusion model distillation, potentially impacting efficiency and generalization in compressed models.