PulseAugur
EN
LIVE 22:02:34

LocalLLaMA users share 16GB VRAM LLM setups for coding

Users on the r/LocalLLaMA subreddit are discussing optimal local large language model (LLM) deployments for hardware configurations featuring 16GB of VRAM and 64GB of RAM. The conversation focuses on identifying the best models and quantization methods for tasks such as coding and agentic workflows. Participants are sharing specific model names, quantization levels, and command-line settings for llama.cpp to help others maximize performance on similar hardware. AI

IMPACT Users are sharing practical advice on running LLMs locally, which can inform others about hardware limitations and software optimizations.

RANK_REASON This is a user discussion forum post about running LLMs locally, not a primary source release or significant industry event.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 English(EN) · /u/whatyathinkk ·

    What are you running on 16Gb VRAM + 64Gb Ram?

    <!-- SC_OFF --><div class="md"><p>I know this gets asked a lot, but I can only find threads that are at least a couple of months old, so I thought I'd ask to see what people are running these days.</p> <p>I have an RTX5080 and 64Gb Ddr5 RAM. What's the best I can run for coding? …