A user on Reddit's r/LocalLLaMA forum reported achieving 1000 tokens per second (tps) generation speed with the Qwen3.6 27B model. This impressive performance was demonstrated using NVIDIA V100 GPUs, handling 128 concurrent requests. For single-user scenarios (batch size 1), the generation speed reached approximately 80 tps, with processing speeds around 3000 tps and no mention of multi-threading processing (MTP) limitations. AI
Summary written by gemini-2.5-flash-lite from 1 sources. How we write summaries →
IMPACT Demonstrates high inference speeds for a 27B parameter model, potentially enabling more efficient local deployments.
RANK_REASON User-reported benchmark of an open-source model on specific hardware. [lever_c_demoted from research: ic=1 ai=1.0]