A user on Reddit's r/LocalLLaMA subreddit shared their setup for running the Qwen3.6-27B model at Q8 quantization. By switching to a tensor split mode across an RTX 5090 and an RTX 3090 Ti, they achieved over 100 tokens per second, a significant improvement from their previous layer split setup. The configuration involves a 70/30 tensor split favoring the more powerful 5090, and the system draws over 750W from the GPUs alone. AI
IMPACT Demonstrates efficient multi-GPU inference configurations for local LLM deployment.
RANK_REASON User-shared configuration for running a specific LLM on consumer hardware.
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →