Zhipu AI has released GLM-5.1-highspeed, a new API for its GLM-5.1 model that achieves an inference speed of 400 tokens per second. This new offering is positioned as the fastest among leading global LLM providers and has demonstrated impressive performance in real-world tests, including rapid code generation and content summarization. The speed enhancement is attributed to significant system engineering optimizations in the inference engine, scheduling system, and underlying infrastructure, aiming to improve the user experience for AI agents by reducing wait times and increasing feedback frequency. AI
Summary written by gemini-2.5-flash-lite from 3 sources. How we write summaries →
IMPACT Accelerates AI agent responsiveness and real-time interaction capabilities across various applications.
RANK_REASON Model release from a frontier lab with a new speed benchmark. [lever_c_demoted from frontier_release: ic=2 ai=1.0]