Scaling and Optimizing Frontier Model Training
Fireworks AI has developed a new training infrastructure that enables the fine-tuning of trillion-parameter Mixture-of-Experts (MoE) models, overcoming previous memory and orchestration bottlenecks. This platform was instrumental in the recent release of Cursor's Composer 2.5, a coding model that achieved top performance on several benchmarks. The system utilizes techniques like low-precision expert quantization and optimizer state offloading to manage the memory demands of large MoE models, making them more accessible for training and fine-tuning. AI
IMPACT Enables training of trillion-parameter MoE models, potentially accelerating the development of more capable frontier models.