Researchers have quantified the energy cost of keeping AI models loaded on GPUs, a practice known as "model parking." Their study found that the primary energy drain comes from the CUDA context, which adds 26-66W of idle power regardless of GPU architecture or memory type. The amount of VRAM allocated to a model has a negligible impact on this idle power consumption. The findings suggest that energy-efficient deployment strategies should focus on minimizing cold-start latency rather than solely on keeping models perpetually loaded. AI
IMPACT Identifies a significant, previously unquantified energy cost in AI inference, suggesting new optimization strategies for deployment.
RANK_REASON Academic paper detailing empirical findings on GPU energy consumption. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →