Qwen3.6 model hits 125 tokens/sec on dual RTX 4060 Ti setup

By PulseAugur Editorial · [1 sources] · 2026-05-30 12:31

A user on Reddit's r/LocalLLaMA community shared impressive performance metrics for the Qwen3.6 model, achieving 125 tokens per second with a q4xl quantization on a dual RTX 4060 Ti setup. This configuration, costing under $1000 and consuming approximately 300 watts, reportedly outperforms more expensive mini PCs released in 2026. The user is exploring ways to further optimize the setup to reach 150 tokens per second. AI

IMPACT Demonstrates high performance and cost-efficiency for running large language models locally.

RANK_REASON User-reported performance benchmark of an open-source model on consumer hardware. [lever_c_demoted from research: ic=1 ai=0.7]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Qwen3.6 model hits 125 tokens/sec on dual RTX 4060 Ti setup

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/Chuyito · 2026-05-30 12:31

125 tok/s for Qwen3.6 q4xl on 2x 4060ti is insane perf/dollar

<table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1tryp2q/125_toks_for_qwen36_q4xl_on_2x_4060ti_is_insane/"> <img alt="125 tok/s for Qwen3.6 q4xl on 2x 4060ti is insane perf/dollar" src="https://preview.redd.it/3sthvqggm94h1.png?width=140&height=87&au…

COVERAGE [1]

125 tok/s for Qwen3.6 q4xl on 2x 4060ti is insane perf/dollar

RELATED ENTITIES

RELATED TOPICS