PulseAugur
EN
LIVE 11:37:17

Fireworks AI releases 196B MoE model optimized for inference

Fireworks AI has released Step 3.7 Flash, a 196-198 billion parameter Mixture-of-Experts (MoE) model. This model was specifically designed with inference efficiency in mind from its inception. The company highlights that many research labs overlook inference optimization until after a model's initial development. AI

IMPACT This model release could offer a more efficient option for inference, potentially lowering costs for AI deployments.

RANK_REASON The cluster describes the release of a new model, but it is not from a tier-1 frontier lab and does not claim state-of-the-art performance on major benchmarks.

Read on X — Fireworks (inference infra) →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Fireworks AI releases 196B MoE model optimized for inference

COVERAGE [2]

  1. X — Fireworks (inference infra) TIER_1 English(EN) · FireworksAI_HQ ·

    Many research labs only consider inference efficiency after the fact. Step 3.7 Flash is a 198B sparse MoE VLM designed by @StepFun_ai for inference from the sta

    Many research labs only consider inference efficiency after the fact. Step 3.7 Flash is a 198B sparse MoE VLM designed by @StepFun_ai for inference from the start. 196B language backbone with a 1.8B vision encoder. Built for real-world agent workloads, running at up to 400 https…

  2. X — Fireworks (inference infra) TIER_1 English(EN) · FireworksAI_HQ ·

    Many research labs only consider inference efficiency after the fact. Step 3.7 Flash is a 196B MoE model, and built for inference from the start by @StepFun_ai.

    Many research labs only consider inference efficiency after the fact. Step 3.7 Flash is a 196B MoE model, and built for inference from the start by @StepFun_ai. Multi-Matrix Factorization Attention (MFA) → KV-cache at ~22% of DeepSeek. Attention-FFN Disaggregation (AFD) →