A user on Reddit shared performance benchmarks for the Qwen 3.6 large language model, specifically testing the 27B and 35B parameter versions. The tests were conducted using a setup with two RTX PRO 6000 GPUs and the latest stable VLLM backend. Results indicate varying throughputs depending on concurrency levels and whether multi-turn prompting (MTP) was enabled, with the 35B model achieving up to 3500 tokens per second at 128 concurrency. AI
Summary written by gemini-2.5-flash-lite from 1 sources. How we write summaries →
IMPACT Provides performance data for Qwen 3.6, aiding developers in hardware selection and deployment for local LLM applications.
RANK_REASON User-generated benchmarks for an open-source model release. [lever_c_demoted from research: ic=1 ai=1.0]