GB200 NVL72 serving costs slashed 2.5x via software upgrades

By PulseAugur Editorial · [1 sources] · 2026-06-22 17:00

Software optimizations for the GB200 NVL72 have drastically reduced serving costs by 2.5 times in under 70 days. These improvements, particularly the rewriting of the NVFP4 MoE kernel using CuTe-DSL and leveraging the NVL72's high-bandwidth copper backplane, were applied to the Kimi architecture, which also powers xAI's Cursor Composer 2.5. This significant cost reduction highlights the impact of software engineering on AI infrastructure efficiency. AI

IMPACT Demonstrates substantial potential for cost savings in AI model serving through targeted software engineering.

RANK_REASON Significant cost reduction in AI hardware serving through software optimization, impacting infrastructure. [lever_c_demoted from significant: ic=1 ai=0.7]

Read on X — SemiAnalysis →

infra

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

GB200 NVL72 serving costs slashed 2.5x via software upgrades

COVERAGE [1]

X — SemiAnalysis TIER_1 English(EN) · SemiAnalysis_ · 2026-06-22 17:00

CUDA MOAT ALERT 🔥: In less than 70 days, GB200 NVL72 serving costs decreased by 2.5x through software improvements alone for the Kimi architecture, which is the

CUDA MOAT ALERT 🔥: In less than 70 days, GB200 NVL72 serving costs decreased by 2.5x through software improvements alone for the Kimi architecture, which is the same model architecture as xAI’s popular Cursor Composer 2.5. One of the key software optimizations was rewriting the h…

COVERAGE [1]

CUDA MOAT ALERT 🔥: In less than 70 days, GB200 NVL72 serving costs decreased by 2.5x through software improvements alone for the Kimi architecture, which is the

RELATED ENTITIES

RELATED TOPICS