Counteraction-Aware Multi-Teacher On-Policy Distillation for General Capability Recovery with Domain Preservation
Researchers have developed a new method called Counteraction-Aware Multi-Teacher On-Policy Distillation (CaMOPD) to address the challenge of recovering general capabilities in large language models (LLMs) after domain specialization. Existing methods often struggle when the training data distribution of general teachers is unknown. CaMOPD tackles this by using decoupled alternating training and a gap-based sample selection strategy. This approach allows for dedicated updates for general recovery, periodic checks for domain preservation, and focuses correction signals on samples with larger teacher-student log-probability gaps. Experiments show CaMOPD outperforms baselines in general recovery while maintaining domain-specific behavior in scenarios like role-play dialogue and medical reasoning. AI
IMPACT This research offers a novel approach to improve LLM performance by recovering general capabilities lost during domain specialization, potentially leading to more versatile models.