What are you running on 16Gb VRAM + 64Gb Ram?
Users on the r/LocalLLaMA subreddit are discussing optimal local large language model (LLM) deployments for hardware configurations featuring 16GB of VRAM and 64GB of RAM. The conversation focuses on identifying the best models and quantization methods for tasks such as coding and agentic workflows. Participants are sharing specific model names, quantization levels, and command-line settings for llama.cpp to help others maximize performance on similar hardware. AI
IMPACT Users are sharing practical advice on running LLMs locally, which can inform others about hardware limitations and software optimizations.