Brief · PulseAugur

TOOL · r/LocalLLaMA English(EN) · 4h

Gemma 4 31B QAT GGUF loads with MTP branch, but outputs repeated <unused49> - any working recipe?

A user on the r/LocalLLaMA subreddit is seeking assistance with running the Gemma 4 31B QAT GGUF model. Despite successfully loading the main model and an MTP assistant head, the model consistently outputs repeated \u003Cunused49\u003E tokens instead of coherent text. The user has attempted various configurations, including different model files, local compatibility fixes, and command-line arguments, but has not found a working solution. AI

IMPACT Troubleshooting a specific model configuration may help other users facing similar issues with local LLM deployments.

llama.cpp
unsloth
GGUF
RTX 5090
Gemma 4
boxwrench