What Do Students Learn? A Feature-Level Analysis of Dark Knowledge
Researchers have developed a new method called Confusion Distillation (CD) to improve self-distillation in machine learning models. This technique analyzes the feature learning process in student models, revealing that effective distillation acts as a regularizer by removing sample-specific features and promoting the use of reusable ones. The CD method leverages the confusion matrix, which contains structural information analogous to a teacher model's "dark knowledge," to create dynamic soft targets for training. Experiments on CIFAR-100 showed CD outperforming existing self-distillation methods. AI
IMPACT This method could lead to more efficient model compression and improved performance in self-supervised learning tasks.