Brief · PulseAugur

TOOL · r/LocalLLaMA English(EN) · 6h

I accidentally crippled my 4x RTX 3090 LLM rig with a hidden PCIe 2.0 x4 slot and fixing it doubled Mistral 128B performance

A user building a multi-GPU setup for local LLM inference discovered a significant performance bottleneck caused by a misconfigured PCIe slot. One of the four RTX 3090 GPUs was incorrectly placed in a slot that only supported PCIe 2.0 x4 speeds, severely limiting its bandwidth. After reconfiguring the GPUs to utilize their full PCIe capabilities, the user observed a dramatic increase in inference speeds, with Mistral 128B performance nearly doubling. AI

IMPACT Fixing hardware bottlenecks can significantly improve local LLM inference speeds and efficiency.

llama.cpp
RTX 3090
Qwen3.6 27B
vLLM
PCIe
nvidia-smi
Linux
Mistral 128B
Gigabyte X399 Designare EX