Brief · PulseAugur

MEME · r/LocalLLaMA English(EN) · 2d

Could someone please help explain these results?

A user on Reddit's r/LocalLLaMA subreddit is seeking assistance understanding unexpected performance gains when running the Qwen3.6-35B-A3B-UD-Q4_K_XL model. They observed a doubling of inference speed, from 17 to 34 tokens/second, after increasing the `--n-cpu-moe` parameter from 8 to 30, which contradicts their expectation of a performance decrease due to increased CPU load. The user is also inquiring about further optimizations for their setup, which includes 12GB VRAM and 32GB RAM, utilizing llama.cpp with the TurboQuant variant. AI

llama.cpp
r/LocalLLaMA
Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf