Brief · PulseAugur

MEME · r/LocalLLaMA English(EN) · 4h

MTP has no impact on my Qwen3.6 MoE performance

A user on the r/LocalLLaMA subreddit is seeking assistance regarding the performance of the Qwen3.6-35B MoE model when using the MTP (Mixture-of-Tensors) optimization. Despite following the unsloth guide and adjusting various flags, the user observed no speedup in token generation between the MTP and non-MTP versions. They are experiencing approximately 60 tokens/second in both scenarios and are looking for insights into why MTP is not providing the expected performance enhancement. AI

unsloth
Qwen3.6-35B
rtx 5060Ti