Nous Research Releases Token Superposition Training to Speed Up LLM Pre-Training by Up to 2.5x Across 270M to 10B Parameter Models
Nous Research has developed Token Superposition Training (TST), a new method designed to significantly accelerate the pre-training of large language models. This technique can reduce pre-training time by up to 2.5 times for models ranging from 270 million to 10 billion parameters, without altering the model's architecture or how it performs inference. TST achieves this by modifying the training loop in two phases: an initial 'superposition' phase where token embeddings are averaged and processed in larger bags, followed by a 'recovery' phase that reverts to standard training. Experiments showed TST achieving lower final training loss with substantially less compute time compared to traditional methods. AI
IMPACT Accelerates LLM pre-training, potentially reducing compute costs and time for developing new large language models.