Brief · PulseAugur

TOOL · dev.to — LLM tag English(EN) · 2w · [4 sources]

Hearth: scale-to-zero LLM serving on Kubernetes — and you can hack on it without a GPU

New Kubernetes operators are emerging to address the cost of running large language models, particularly the issue of idle GPUs burning money. Hearth, an alpha-stage operator, allows users to declaratively serve open-source LLMs and scale them down to zero when not in use, buffering requests during cold starts. Another approach involves building a KEDA external scaler using NVML to enable autoscaling based on actual GPU utilization, reducing the need for a full metrics stack like Prometheus. AI

IMPACT Enables cost-effective self-hosting of LLMs by reducing idle GPU expenditure.

Hearth
LLMs
NVIDIA
Qwen
DeepSeek
Kubernetes
OpenAI
LLM
vLLM
KEDA
NVML
GPU