PulseAugur
EN
LIVE 04:15:10

Qwen 3.6 LLM benchmarks show high throughput on dual RTX PRO 6000

A user on Reddit shared performance benchmarks for the Qwen 3.6 large language model, specifically testing the 27B and 35B parameter versions. The tests were conducted using a setup with two RTX PRO 6000 GPUs and the latest stable VLLM backend. Results indicate varying throughputs depending on concurrency levels and whether multi-turn prompting (MTP) was enabled, with the 35B model achieving up to 3500 tokens per second at 128 concurrency. AI

IMPACT Provides performance data for Qwen 3.6, aiding developers in hardware selection and deployment for local LLM applications.

RANK_REASON User-generated benchmarks for an open-source model release. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 Deutsch(DE) · /u/mxforest ·

    Qwen 3.6 benchmarks on 2x RTX PRO 6000

    <!-- SC_OFF --><div class="md"><p>Got a chance to play around with 2x RTX PRO 6000 setup so sharing some number for Qwen 3.6.<br /> All these were run using latest stable VLLM backend. This was for a personal project.</p> <p>Qwen 3.6 27B BF16 (Original without any quantization)</…