English(EN) Gemma 4 31B QAT GGUF loads with MTP branch, but outputs repeated <unused49> - any working recipe?

用户寻求修复 Gemma 4 31B 模型重复输出 token 的问题

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-07 09:02

一位 Reddit r/LocalLLaMA 版块的用户正在寻求运行 Gemma 4 31B QAT GGUF 模型的帮助。尽管成功加载了主模型和 MTP 助手头，但模型始终输出重复的 \u003Cunused49\u003E token，而不是连贯的文本。用户尝试了各种配置，包括不同的模型文件、本地兼容性修复和命令行参数，但尚未找到可行的解决方案。 AI

影响对特定模型配置进行故障排除可能有助于其他用户在本地部署 LLM 时遇到类似问题。

排序理由用户生成的针对特定模型版本和格式的技术支持请求。[lever_c_demoted from research: ic=1 ai=1.0]

在 r/LocalLLaMA 阅读 →

模型发布

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/LocalLLaMA TIER_1 English(EN) · /u/WaveformEntropy · 2026-06-07 09:02

Gemma 4 31B QAT GGUF 加载 MTP 分支，但输出重复的 <unused49> - 有可用的解决方案吗？

<div class="md">I’m trying to run: unsloth/gemma-4-31B-it-qat-GGUF gemma-4-31B-it-qat-UD-Q4_K_XL.gguf on an RTX 5090 32GB using llama.cpp Gemma 4 MTP PR branch. Main model loads. Without the MTP assistant head, /v1/chat/completions re…

报道来源 [1]

Gemma 4 31B QAT GGUF 加载 MTP 分支，但输出重复的 <unused49> - 有可用的解决方案吗？

相关实体

相关话题