PulseAugur
实时 22:30:19
English(EN) In Q8_0 weight quantization, why can't we just skip blocks of 32 that have very large outliers?

LLM 量化查询:为提高准确性而跳过异常值块

r/LocalLLaMA 上的一位用户正在咨询有关大型语言模型权重量化的高级技术。具体来说,他们质疑为什么 Q8_0 量化中的 32 个值块如果包含异常值就不能被跳过。用户建议,为这些块保留原生值可以显著提高模型准确性,因为只有不到 1% 的子层可能需要被跳过。 AI

排序理由 用户对 LLM 量化的技术方面提出疑问。[lever_c_demoted from research: ic=1 ai=1.0]

在 r/LocalLLaMA 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

报道来源 [1]

  1. r/LocalLLaMA TIER_1 English(EN) · /u/fragment_me ·

    In Q8_0 weight quantization, why can't we just skip blocks of 32 that have very large outliers?

    <!-- SC_OFF --><div class="md"><p>Looking for someone with an expert-level understanding.</p> <p>I understand that we can skip layers and sub-layers when doing quantization, but why can't we skip blocks? I am using Q8_0 as it's a simple example. Every block of 32 values has a sca…