PulseAugur
LIVE 18:11:50
significant · [3 sources] · · 中文(ZH) 顶流里最快!智谱,你是在「喷」代码吧

Zhipu AI launches GLM-5.1-highspeed API at 400 tokens/s

Zhipu AI has released GLM-5.1-highspeed, a new API for its GLM-5.1 model that achieves an inference speed of 400 tokens per second. This new offering is positioned as the fastest among leading global LLM providers and has demonstrated impressive performance in real-world tests, including rapid code generation and content summarization. The speed enhancement is attributed to significant system engineering optimizations in the inference engine, scheduling system, and underlying infrastructure, aiming to improve the user experience for AI agents by reducing wait times and increasing feedback frequency. AI

Summary written by gemini-2.5-flash-lite from 3 sources. How we write summaries →

IMPACT Accelerates AI agent responsiveness and real-time interaction capabilities across various applications.

RANK_REASON Model release from a frontier lab with a new speed benchmark. [lever_c_demoted from frontier_release: ic=2 ai=1.0]

Read on 量子位 (QbitAI) →

COVERAGE [3]

  1. 量子位 (QbitAI) TIER_1 中文(ZH) · 十三 ·

    The fastest among the top streams! Zhipu, are you 'spraying' code?

    400 tokens/s

  2. Pandaily TIER_1 · [email protected] (Pandaily) ·

    Zhipu AI Launches GLM-5.1 High-Speed API: 400 Tokens/s Sets New Global Benchmark

    Zhipu AI has launched GLM-5.1-highspeed, an API variant of its GLM-5.1 model delivering 400 tokens per second — reportedly the fastest inference speed among major global LLM providers.

  3. Mastodon — mastodon.social TIER_1 · [email protected] ·

    Zhipu AI has launched GLM-5.1-highspeed, a high-speed API variant of its GLM-5.1 large language model, delivering 400 tokens per second and reportedly setting a

    Zhipu AI has launched GLM-5.1-highspeed, a high-speed API variant of its GLM-5.1 large language model, delivering 400 tokens per second and reportedly setting a new global benchmark for inference speed among major LLM providers. The API targets enterprise applications requiring r…