Dumb question: How would performance be if you took a used server with like 80 lanes pcie 5 and stuck NVMe on them for model run?
A user on r/LocalLLaMA is exploring the potential of using a server equipped with numerous PCIe 5 lanes to host large language models. The idea is to populate these lanes with NVMe SSDs, creating a high-bandwidth storage solution that could theoretically offer speeds competitive with VRAM for running models up to 1-2TB. The user questions why this approach isn't more common for self-hosting massive models. AI