Brief · PulseAugur

TOOL · r/LocalLLaMA English(EN) · 6h

QAT MTP Heads Upload + PARALLEL=2 Fix + 12B 2-slot Bench

The Gemma 4 QAT MTP assistant heads have been released on HuggingFace, offering improved performance for speculative decoding. These heads are specifically trained to match the quantized weights of the Gemma 4 models, significantly increasing acceptance rates compared to non-QAT matched heads. Additionally, a critical crash bug in the llama.cpp implementation when using two parallel processing threads has been identified and fixed, improving stability for local LLM inference. AI

IMPACT Enables more efficient local inference for Gemma 4 models by providing optimized components and fixing critical bugs.

Google
llama.cpp
Gemma 4
Atomic TurboQuant