Brief · PulseAugur

TOOL · r/LocalLLaMA English(EN) · 1d

1000 tps generation on Qwen3.6 27B with V100s

A user on Reddit's r/LocalLLaMA forum reported achieving 1000 tokens per second (tps) generation speed with the Qwen3.6 27B model. This impressive performance was demonstrated using NVIDIA V100 GPUs, handling 128 concurrent requests. For single-user scenarios (batch size 1), the generation speed reached approximately 80 tps, with processing speeds around 3000 tps and no mention of multi-threading processing (MTP) limitations. AI

IMPACT Demonstrates high inference speeds for a 27B parameter model, potentially enabling more efficient local deployments.

Qwen3.6 27B
r/LocalLLaMA
V100s