A user on the r/LocalLLaMA subreddit is seeking assistance regarding the performance of the Qwen3.6-35B MoE model when using the MTP (Mixture-of-Tensors) optimization. Despite following the unsloth guide and adjusting various flags, the user observed no speedup in token generation between the MTP and non-MTP versions. They are experiencing approximately 60 tokens/second in both scenarios and are looking for insights into why MTP is not providing the expected performance enhancement. AI
RANK_REASON User-generated content on a forum discussing technical performance of a specific model and optimization, lacking broader industry significance.
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →