PulseAugur
EN
LIVE 10:47:40

Qwen3.6-27B model achieves 100+ t/s with dual-GPU tensor split

A user on Reddit's r/LocalLLaMA subreddit shared their setup for running the Qwen3.6-27B model at Q8 quantization. By switching to a tensor split mode across an RTX 5090 and an RTX 3090 Ti, they achieved over 100 tokens per second, a significant improvement from their previous layer split setup. The configuration involves a 70/30 tensor split favoring the more powerful 5090, and the system draws over 750W from the GPUs alone. AI

IMPACT Demonstrates efficient multi-GPU inference configurations for local LLM deployment.

RANK_REASON User-shared configuration for running a specific LLM on consumer hardware.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Qwen3.6-27B model achieves 100+ t/s with dual-GPU tensor split

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 English(EN) · /u/Shoddy_Bed3240 ·

    100+ t/s on Qwen3.6-27B Q8 across a 5090 + 3090 Ti — switching to tensor split-mode got me from 70 to 100+

    <!-- SC_OFF --><div class="md"><p>Wanted to share a setup that's been working great for me. Running Qwen3.6-27B at Q8_0 across two GPUs (RTX 5090 + RTX 3090 Ti) and getting ~100 t/s.</p> <p>The big jump came from switching <code>--split-mode</code> to <code>tensor</code>. I was s…