PulseAugur / Brief
EN
LIVE 14:33:53

Brief

last 24h
[1/1] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. Hearth: scale-to-zero LLM serving on Kubernetes — and you can hack on it without a GPU

    New Kubernetes operators are emerging to address the cost of running large language models, particularly the issue of idle GPUs burning money. Hearth, an alpha-stage operator, allows users to declaratively serve open-source LLMs and scale them down to zero when not in use, buffering requests during cold starts. Another approach involves building a KEDA external scaler using NVML to enable autoscaling based on actual GPU utilization, reducing the need for a full metrics stack like Prometheus. AI

    IMPACT Enables cost-effective self-hosting of LLMs by reducing idle GPU expenditure.