Fireworks AI enables training of trillion-parameter MoE models

By PulseAugur Editorial · [2 sources] · 2026-05-18 19:53

Fireworks AI has developed a new training infrastructure that enables the fine-tuning of trillion-parameter Mixture-of-Experts (MoE) models, overcoming previous memory and orchestration bottlenecks. This platform was instrumental in the recent release of Cursor's Composer 2.5, a coding model that achieved top performance on several benchmarks. The system utilizes techniques like low-precision expert quantization and optimizer state offloading to manage the memory demands of large MoE models, making them more accessible for training and fine-tuning. AI

IMPACT Enables training of trillion-parameter MoE models, potentially accelerating the development of more capable frontier models.

RANK_REASON Fireworks AI's blog post details their infrastructure for training large MoE models, which was used to train Cursor's Composer 2.5.

Read on Fireworks AI blog →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Fireworks AI enables training of trillion-parameter MoE models

COVERAGE [2]

X — Fireworks (inference infra) TIER_1 English(EN) · FireworksAI_HQ · 2026-05-18 19:53

The @cursor_ai team shipped Composer 2 and now Composer 2.5 on the same Kimi K2.5 base model. Performance benchmarks are📈. Frontier quality and open-source econ

The @cursor_ai team shipped Composer 2 and now Composer 2.5 on the same Kimi K2.5 base model. Performance benchmarks are📈. Frontier quality and open-source economics. 85% of the compute powering these gains came from RL. Fireworks powers the RL rollouts. Learn more about https:/…
Fireworks AI blog TIER_1 English(EN) · 2026-05-25 03:01

Scaling and Optimizing Frontier Model Training

Fireworks Training SDK provides the model catalog, parallelism stack, precision kernels, and memory optimizations that make it possible to fine-tune trillion-parameter MoE models on current hardware.

COVERAGE [2]

The @cursor_ai team shipped Composer 2 and now Composer 2.5 on the same Kimi K2.5 base model. Performance benchmarks are📈. Frontier quality and open-source econ

Scaling and Optimizing Frontier Model Training

RELATED ENTITIES

RELATED TOPICS