A user on Reddit shared performance benchmarks for the Qwen 3.6 large language model, specifically testing the 27B and 35B parameter versions. The tests were conducted using a setup with two RTX PRO 6000 GPUs and the latest stable VLLM backend. Results indicate varying throughputs depending on concurrency levels and whether multi-turn prompting (MTP) was enabled, with the 35B model achieving up to 3500 tokens per second at 128 concurrency. AI
IMPACT Provides performance data for Qwen 3.6, aiding developers in hardware selection and deployment for local LLM applications.
RANK_REASON User-generated benchmarks for an open-source model release. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →