PulseAugur
EN
LIVE 10:59:11

User seeks Qwen3.6 MoE speedup with MTP optimization

A user on the r/LocalLLaMA subreddit is seeking assistance regarding the performance of the Qwen3.6-35B MoE model when using the MTP (Mixture-of-Tensors) optimization. Despite following the unsloth guide and adjusting various flags, the user observed no speedup in token generation between the MTP and non-MTP versions. They are experiencing approximately 60 tokens/second in both scenarios and are looking for insights into why MTP is not providing the expected performance enhancement. AI

RANK_REASON User-generated content on a forum discussing technical performance of a specific model and optimization, lacking broader industry significance.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 English(EN) · /u/redblood252 ·

    MTP has no impact on my Qwen3.6 MoE performance

    <!-- SC_OFF --><div class="md"><p>Hello I have an rtx 5060Ti and I tried running unsloth's Qwen3.6-35B GGUF with MTP. However in both cases I have around 60 tok/s.</p> <p>Here are my flags:</p> <pre><code>llama-server -hf unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q4_K_M --temp 0.6 --top-p …