PulseAugur
EN
LIVE 22:32:11

User asks about dual-GPU performance for local LLMs

A user on Reddit's r/LocalLLaMA subreddit is seeking advice on optimizing hardware for running large language models locally. They are currently able to run a 16 billion parameter model with Q4 quantization on a single 16GB VRAM GPU. The user is inquiring whether adding a second 16GB GPU would allow them to achieve similar performance with a 32 billion parameter model, or if potential PCIe bandwidth limitations would result in slower speeds. AI

IMPACT N/A

RANK_REASON User question about hardware configuration for LLMs.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 Nederlands(NL) · /u/TrainingTwo1118 ·

    16B dense on 16GB GPU vs 32B dense on 2x 16GB GPU

    <!-- SC_OFF --><div class="md"><p>I'm currently trying to plan a build to run big(-ish) LLMs locally, and was wondering the following:</p> <p>I'm able to run a 16B dense model at Q4 with reasonable context size on a single 16GB VRAM GPU (9070 XT).</p> <p>If I were to add a second…