English(EN) Long-context performance at lower quants

Qwen3.5模型在低量化下长上下文表现不佳

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-26 16:52

一位Reddit r/LocalLLaMA用户在使用Qwen3.5 122B A10B模型时，当上下文窗口超过约75-80k token时，遇到了显著的性能下降。模型开始出现幻觉、遗忘信息和错误归因。用户怀疑这可能是由于使用了Q3量化级别，因为他们的系统在没有磁盘交换的情况下无法处理Q4，并正在寻求建议，以确定这是模型特有的问题、量化限制，还是特定的llama.cpp设置可以缓解该问题，并指出他们已经在使用BF16 KV缓存。 AI

影响凸显了量化模型在处理扩展上下文时可能存在的局限性，影响了复杂任务的可用性。

排序理由用户生成关于模型性能限制的讨论。

在 r/LocalLLaMA 阅读 →

模型发布

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/LocalLLaMA TIER_1 English(EN) · /u/_TheWolfOfWalmart_ · 2026-05-26 16:52

Long-context performance at lower quants

<div class="md"><p>I've been using Qwen3.5 122B A10B (Q3_K_XL) a lot lately for coding, and it's been pretty incredible overall like it feels not far off from frontier-level for most tasks -- but I've been noticing that usually once I hit around 75-80k context use,…

报道来源 [1]

Long-context performance at lower quants

相关实体

相关话题