Zai replaced the network architecture running GLM-5.1 inference and the gains are pretty wild
Zai has significantly improved the performance and reduced costs of its GLM-5.1 inference cluster by implementing a new network architecture called ZCube. This custom design, developed with Tsinghua University and HarnetsAI, replaces the standard ROFT setup and addresses inefficiencies in traffic patterns during disaggregated inference. The result is a 33% reduction in hardware costs and a 15% increase in GPU inference throughput, alongside a substantial decrease in latency. AI
IMPACT Optimized network architecture for AI inference can lead to lower operational costs and faster model deployment.