Qwen3.6 27B model hits 1000 tps on V100 GPUs

By PulseAugur Editorial · [1 sources] · 2026-05-25 04:42

A user on Reddit's r/LocalLLaMA forum reported achieving 1000 tokens per second (tps) generation speed with the Qwen3.6 27B model. This impressive performance was demonstrated using NVIDIA V100 GPUs, handling 128 concurrent requests. For single-user scenarios (batch size 1), the generation speed reached approximately 80 tps, with processing speeds around 3000 tps and no mention of multi-threading processing (MTP) limitations. AI

IMPACT Demonstrates high inference speeds for a 27B parameter model, potentially enabling more efficient local deployments.

RANK_REASON User-reported benchmark of an open-source model on specific hardware. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Qwen3.6 27B model hits 1000 tps on V100 GPUs

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/Simple_Library_2700 · 2026-05-25 04:42

1000 tps generation on Qwen3.6 27B with V100s

<table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1tmyln6/1000_tps_generation_on_qwen36_27b_with_v100s/"> <img alt="1000 tps generation on Qwen3.6 27B with V100s" src="https://preview.redd.it/osektfjrq73h1.jpeg?width=640&crop=smart&auto=webp&s=381…

COVERAGE [1]

1000 tps generation on Qwen3.6 27B with V100s

RELATED ENTITIES

RELATED TOPICS