Qwen 3.6 benchmarks on 2x RTX PRO 6000
A user on Reddit shared performance benchmarks for the Qwen 3.6 large language model, specifically testing the 27B and 35B parameter versions. The tests were conducted using a setup with two RTX PRO 6000 GPUs and the latest stable VLLM backend. Results indicate varying throughputs depending on concurrency levels and whether multi-turn prompting (MTP) was enabled, with the 35B model achieving up to 3500 tokens per second at 128 concurrency. AI
IMPACT Provides performance data for Qwen 3.6, aiding developers in hardware selection and deployment for local LLM applications.