User seeks advice on optimizing LLM performance with RTX 5090 and 64GB RAM

By PulseAugur Editorial · [1 sources] · 2026-05-26 17:56

A user on the r/LocalLLaMA subreddit is seeking advice on optimizing their hardware setup for running large language models. They have a single NVIDIA RTX 5090 GPU with 64GB of DDR5 RAM and are debating between using Qwen 3.6 27b NVFP4 via vLLM or a 35b a3b model at Q8 on Llama for agentic coding tasks. The user is primarily concerned with effectively utilizing their system's memory for better performance. AI

IMPACT Users are exploring hardware configurations to optimize local LLM performance for specific tasks like agentic coding.

RANK_REASON User-generated content seeking advice on hardware and model configuration for LLMs.

Read on r/LocalLLaMA →

infra

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

User seeks advice on optimizing LLM performance with RTX 5090 and 64GB RAM

COVERAGE [1]

r/LocalLLaMA TIER_1 Dansk(DA) · /u/icedgz · 2026-05-26 17:56

Looking for Suggestions — Single 5090 & 64gb DDR5

<div class="md"><p>Hi Reddit,</p> <p>I am planning on running Qwen 3.6 27b NVFP4 via vLLM on my 5090 but was wondering if something like 35b a3b at Q8 on Llama would produce better results for agentic coding and utilize the system memory. My research says no but if…

COVERAGE [1]

Looking for Suggestions — Single 5090 & 64gb DDR5

RELATED ENTITIES

RELATED TOPICS