Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 1w · [4 sources]

TFGN: Task-Free, Replay-Free Continual Pre-Training Without Catastrophic Forgetting at LLM Scale

Researchers have developed new architectural approaches to address catastrophic forgetting in large language models during continual pre-training and fine-tuning. One method, TFGN, introduces an overlay that allows for parameter-efficient updates without altering the core transformer, demonstrating significant retention of prior knowledge across diverse domains and model scales. Another approach, UAM, inspired by biological vision, uses a dual-stream architecture to separate semantic understanding from action control, preserving multimodal capabilities during VLA model training. These advancements aim to enable models to learn continuously without degrading performance on previously acquired knowledge. AI

IMPACT New architectural designs for LLMs and VLA models promise improved continual learning capabilities, reducing knowledge degradation during fine-tuning and pre-training.