User seeks advice on Qwen 3.6 35B MoE quantization for coding

By PulseAugur Editorial · [1 sources] · 2026-06-12 00:24

A user on the r/LocalLLaMA subreddit is seeking advice on choosing between two quantization formats, IQ3_M and IQ4_NL, for the Qwen 3.6 35B MoE model. The decision hinges on balancing performance and VRAM usage, as the IQ4_NL format may exceed the user's 16GB VRAM and spill into system RAM. The user is primarily using the model for 'vibe coding' with tools like Ollama and Aider, and is weighing the potential loss in logic and syntax precision against the speed benefits of keeping the model entirely within VRAM. AI

IMPACT User-level discussion on optimizing local LLM performance for coding tasks.

RANK_REASON User discussion about model quantization and performance trade-offs.

Read on r/LocalLLaMA →

product

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/unkclxwn · 2026-06-12 00:24

Qwen 3.6 35B MoE: IQ3_M vs IQ4_NL for Aider/vibe coding?

<div class="md"><p>Rn im running Ollama + Aider on Linux (rx9070xt 16GB, 32GB ram). This is strictly for vibe coding, nothing enterprise<br /> im trying to decide between IQ3\_M and IQ4\_NL for “Qwen 3.6 35B-A3B MoE”</p> <p>IQ3\_M fits entirely in my 16GB vram. IQ4…

COVERAGE [1]

Qwen 3.6 35B MoE: IQ3_M vs IQ4_NL for Aider/vibe coding?

RELATED ENTITIES

RELATED TOPICS