Software optimizations for the GB200 NVL72 have drastically reduced serving costs by 2.5 times in under 70 days. These improvements, particularly the rewriting of the NVFP4 MoE kernel using CuTe-DSL and leveraging the NVL72's high-bandwidth copper backplane, were applied to the Kimi architecture, which also powers xAI's Cursor Composer 2.5. This significant cost reduction highlights the impact of software engineering on AI infrastructure efficiency. AI
IMPACT Demonstrates substantial potential for cost savings in AI model serving through targeted software engineering.
RANK_REASON Significant cost reduction in AI hardware serving through software optimization, impacting infrastructure. [lever_c_demoted from significant: ic=1 ai=0.7]
- Cursor Composer 2.5
- CuTe-DSL
- GB200 NVL72
- Jun Yang
- Kimi architecture
- NVFP4 MoE kernel
- NVIDIA
- xAI
- Xin Li
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →