nvidia-smi
PulseAugur coverage of nvidia-smi — every cluster mentioning nvidia-smi across labs, papers, and developer communities, ranked by signal.
4 day(s) with sentiment data
-
DGX Spark GPU overheating solved by clock-locking with nvidia-smi
A developer has found a workaround for overheating issues with the DGX Spark GPU when running large language models like Ollama and Qwen2.5. The GPU, specifically the GB10, lacks user-accessible power and fan controls, …
-
KV cache memory problem plagues LLM serving, vLLM's PagedAttention offers solution
The KV cache is a critical component in LLM inference, storing past computations to avoid recomputing them for each new token. However, its memory footprint can become a significant bottleneck, especially in production …
-
Local LLM Hardware Guide: VRAM, Quantization, and Performance
Running large language models (LLMs) locally, particularly those with 70 billion parameters, presents significant hardware challenges, primarily concerning VRAM capacity. While marketing often suggests minimal requireme…
-
User doubles LLM inference speed by fixing PCIe slot bottleneck
A user building a multi-GPU setup for local LLM inference discovered a significant performance bottleneck caused by a misconfigured PCIe slot. One of the four RTX 3090 GPUs was incorrectly placed in a slot that only sup…
-
Utilyze offers open-source tool for deeper GPU performance insights beyond load
Utilyze is a new open-source tool designed to provide deeper insights into GPU performance beyond simple load percentages. It directly accesses GPU performance counters to measure the actual utilization and efficiency o…