A user on Reddit's r/LocalLLaMA subreddit is seeking assistance understanding unexpected performance gains when running the Qwen3.6-35B-A3B-UD-Q4_K_XL model. They observed a doubling of inference speed, from 17 to 34 tokens/second, after increasing the `--n-cpu-moe` parameter from 8 to 30, which contradicts their expectation of a performance decrease due to increased CPU load. The user is also inquiring about further optimizations for their setup, which includes 12GB VRAM and 32GB RAM, utilizing llama.cpp with the TurboQuant variant. AI
Summary written by gemini-2.5-flash-lite from 1 sources. How we write summaries →
RANK_REASON This is a user-generated question on a specific technical configuration, not a general industry announcement or development.