SAE-FD: Sparse Autoencoder Feature Distillation for Continual Learning of Large Language Models
Researchers have developed a new method called Sparse Autoencoder Feature Distillation (SAE-FD) to combat catastrophic forgetting in large language models during continual learning. This approach leverages the sparse feature space of a pre-trained Sparse Autoencoder to disentangle learned concepts, allowing for more precise regularization. Experiments demonstrate that SAE-FD significantly outperforms existing regularization techniques on continual learning benchmarks, showing improved accuracy with minimal negative transfer. AI
IMPACT This method could enable LLMs to learn new information more effectively without losing previously acquired knowledge, improving their adaptability.