Idle GPU power cost driven by CUDA context, not VRAM

By PulseAugur Editorial · [1 sources] · 2026-05-26 04:00

Researchers have quantified the energy cost of keeping AI models loaded on GPUs, a practice known as "model parking." Their study found that the primary energy drain comes from the CUDA context, which adds 26-66W of idle power regardless of GPU architecture or memory type. The amount of VRAM allocated to a model has a negligible impact on this idle power consumption. The findings suggest that energy-efficient deployment strategies should focus on minimizing cold-start latency rather than solely on keeping models perpetually loaded. AI

IMPACT Identifies a significant, previously unquantified energy cost in AI inference, suggesting new optimization strategies for deployment.

RANK_REASON Academic paper detailing empirical findings on GPU energy consumption. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

infra
paper

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.LG TIER_1 English(EN) · Sai Sathvik Vadari · 2026-05-26 04:00

The Model Parking Tax: Quantifying the Hidden Energy Cost of Always-On GPU Model Deployment

arXiv:2605.23918v1 Announce Type: cross Abstract: The AI inference industry keeps models loaded in GPU memory around the clock to avoid cold-start latency, implicitly treating idle power as a fixed cost of readiness. Yet the structure of this cost has never been empirically decom…

COVERAGE [1]

The Model Parking Tax: Quantifying the Hidden Energy Cost of Always-On GPU Model Deployment

RELATED ENTITIES

RELATED TOPICS