Zai cuts AI inference costs by 33% with new ZCube network

By PulseAugur Editorial · [1 sources] · 2026-05-28 13:09

Zai has significantly improved the performance and reduced costs of its GLM-5.1 inference cluster by implementing a new network architecture called ZCube. This custom design, developed with Tsinghua University and HarnetsAI, replaces the standard ROFT setup and addresses inefficiencies in traffic patterns during disaggregated inference. The result is a 33% reduction in hardware costs and a 15% increase in GPU inference throughput, alongside a substantial decrease in latency. AI

IMPACT Optimized network architecture for AI inference can lead to lower operational costs and faster model deployment.

RANK_REASON The cluster describes a technical improvement to AI inference infrastructure, detailing specific performance gains and cost reductions, which falls under research into AI systems. [lever_c_demoted from research: ic=1 ai=0.7]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Zai cuts AI inference costs by 33% with new ZCube network

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/Scared-Biscotti2287 · 2026-05-28 13:09

Zai replaced the network architecture running GLM-5.1 inference and the gains are pretty wild

<table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1tq35a0/zai_replaced_the_network_architecture_running/"> <img alt="Zai replaced the network architecture running GLM-5.1 inference and the gains are pretty wild" src="https://preview.redd.it/r2ad9gqtnv3h1.jpeg…

COVERAGE [1]

Zai replaced the network architecture running GLM-5.1 inference and the gains are pretty wild

RELATED ENTITIES

RELATED TOPICS