Fireworks AI launches 196B MoE model optimized for inference

By PulseAugur Editorial · [1 sources] · 2026-06-01 23:34

Fireworks AI has released Step 3.7 Flash, a 196 billion parameter Mixture-of-Experts model designed with inference efficiency as a primary consideration. This approach contrasts with many research labs that prioritize inference optimization only after initial model development. AI

IMPACT This model's focus on inference efficiency could lead to more cost-effective AI deployments.

RANK_REASON Release of a new model with technical details, but not from a frontier lab. [lever_c_demoted from research: ic=1 ai=1.0]

Read on X — Fireworks (inference infra) →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

X — Fireworks (inference infra) TIER_1 English(EN) · FireworksAI_HQ · 2026-06-01 23:34

Many research labs only consider inference efficiency after the fact. Step 3.7 Flash is a 196B MoE model, and built for inference from the start by @StepFun_ai.

Many research labs only consider inference efficiency after the fact. Step 3.7 Flash is a 196B MoE model, and built for inference from the start by @StepFun_ai. Multi-Matrix Factorization Attention (MFA) → KV-cache at ~22% of DeepSeek. Attention-FFN Disaggregation (AFD) →

COVERAGE [1]

Many research labs only consider inference efficiency after the fact. Step 3.7 Flash is a 196B MoE model, and built for inference from the start by @StepFun_ai.

RELATED ENTITIES

RELATED TOPICS