PulseAugur
EN
LIVE 00:09:30

Fireworks AI launches GLM 5.2 Fast for higher inference speeds · 2 sources tracked

Fireworks AI has released a faster version of the GLM 5.2 model, named GLM 5.2 Fast. This new iteration offers the same quality as the standard GLM 5.2 but achieves significantly higher inference speeds, reaching up to 140 tokens per second. The company also highlighted custom deployment options for even greater performance, noting speeds of 446 tokens per second on Artificial Analysis. AI

IMPACT Increases inference speed for LLMs, potentially lowering costs and improving real-time application performance.

RANK_REASON Model release from a frontier AI lab. [lever_c_demoted from frontier_release: ic=2 ai=1.0]

Read on X — Fireworks (inference infra) →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Fireworks AI launches GLM 5.2 Fast for higher inference speeds · 2 sources tracked

COVERAGE [2]

  1. X — Fireworks (inference infra) TIER_1 English(EN) · FireworksAI_HQ ·

    For even higher speeds, reach out for a custom deployment!

    For even higher speeds, reach out for a custom deployment! We’ve hit 446 tok/s on Artificial Analysis. Learn more → https://t.co/rpFJ2dIZvX

  2. X — Fireworks (inference infra) TIER_1 English(EN) · FireworksAI_HQ ·

    We heard your feedback. You want to go faster.

    We heard your feedback. You want to go faster. Introducing GLM 5.2 Fast The same model and quality as GLM 5.2 standard, now at 140 tok/s Flip one model ID → accounts/fireworks/routers/glm-5p2-fast https://t.co/jaYWA4lPi0