English(EN) In Q8_0 weight quantization, why can't we just skip blocks of 32 that have very large outliers?

LLM 量化查询：为提高准确性而跳过异常值块

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-02 20:51

r/LocalLLaMA 上的一位用户正在咨询有关大型语言模型权重量化的高级技术。具体来说，他们质疑为什么 Q8_0 量化中的 32 个值块如果包含异常值就不能被跳过。用户建议，为这些块保留原生值可以显著提高模型准确性，因为只有不到 1% 的子层可能需要被跳过。 AI

排序理由用户对 LLM 量化的技术方面提出疑问。[lever_c_demoted from research: ic=1 ai=1.0]

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/LocalLLaMA TIER_1 English(EN) · /u/fragment_me · 2026-06-02 20:51

In Q8_0 weight quantization, why can't we just skip blocks of 32 that have very large outliers?

<div class="md"><p>Looking for someone with an expert-level understanding.</p> <p>I understand that we can skip layers and sub-layers when doing quantization, but why can't we skip blocks? I am using Q8_0 as it's a simple example. Every block of 32 values has a sca…