Zai has significantly improved the performance and reduced costs of its GLM-5.1 inference cluster by implementing a new network architecture called ZCube. This custom design, developed with Tsinghua University and HarnetsAI, replaces the standard ROFT setup and addresses inefficiencies in traffic patterns during disaggregated inference. The result is a 33% reduction in hardware costs and a 15% increase in GPU inference throughput, alongside a substantial decrease in latency. AI
IMPACT Optimized network architecture for AI inference can lead to lower operational costs and faster model deployment.
RANK_REASON The cluster describes a technical improvement to AI inference infrastructure, detailing specific performance gains and cost reductions, which falls under research into AI systems. [lever_c_demoted from research: ic=1 ai=0.7]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →