Toward Understanding Adversarial Distillation: Why Robust Teachers Fail
Researchers have identified a key mechanism behind the inconsistent success of adversarial distillation, a technique used to improve student model robustness. They found that when a robust teacher model provides confident supervision on data points that are difficult for the student to learn from, it can lead to the student model overfitting to noise. Conversely, teachers that show uncertainty on these challenging samples help the student focus on learnable, robust signals, leading to better generalization. AI
IMPACT Provides a theoretical framework and practical guidance for selecting teachers in adversarial distillation, potentially improving the robustness of AI models.