tool · [1 source] · 2026-05-25 06:35

Qwen 3.6 LLM benchmarks show high throughput on dual RTX PRO 6000

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 sources

A user on Reddit shared performance benchmarks for the Qwen 3.6 large language model, specifically testing the 27B and 35B parameter versions. The tests were conducted using a setup with two RTX PRO 6000 GPUs and the latest stable VLLM backend. Results indicate varying throughputs depending on concurrency levels and whether multi-turn prompting (MTP) was enabled, with the 35B model achieving up to 3500 tokens per second at 128 concurrency. AI

Summary written by gemini-2.5-flash-lite from 1 sources. How we write summaries →

IMPACT Provides performance data for Qwen 3.6, aiding developers in hardware selection and deployment for local LLM applications.

RANK_REASON User-generated benchmarks for an open-source model release. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/LocalLLaMA →

COVERAGE [1]

r/LocalLLaMA TIER_1 Deutsch(DE) · /u/mxforest · 2026-05-25 06:35

Qwen 3.6 benchmarks on 2x RTX PRO 6000

<div class="md"><p>Got a chance to play around with 2x RTX PRO 6000 setup so sharing some number for Qwen 3.6.<br /> All these were run using latest stable VLLM backend. This was for a personal project.</p> <p>Qwen 3.6 27B BF16 (Original without any quantization)</…

COVERAGE [1]

Qwen 3.6 benchmarks on 2x RTX PRO 6000

RELATED ENTITIES

RELATED TOPICS