English(EN) Storing an index to a scale instead of the scale itself with Q4_0 quant reduces scale size by ~31% (small gain but interesting)

LLM 量化技术节省模型存储空间

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-14 00:21

r/LocalLLaMA 的一位 Reddit 用户发现了一种方法，可以通过存储量化值（scale values）的索引而不是量化值本身来减小量化大语言模型的文件大小。该技术在 Qwen 3.5 2B 和 Qwen 3.6 27B 模型上使用 Q4_0 量化进行演示，可以将 Qwen 3.6 27B 模型的大小减小约 318MB。该用户详细介绍了数学过程，展示了如何使用每个 32 个权重块的 11 位索引而不是 16 位量化值来回收大量存储空间，并且在 token 嵌入方面可能进一步节省空间。 AI

影响可能降低量化 LLM 的存储需求，使其在本地硬件上更易于访问。

排序理由用户生成的关于 LLM 量化的技术分析和优化建议。[lever_c_demoted from research: ic=1 ai=0.7]

在 r/LocalLLaMA 阅读 →

基础设施

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/LocalLLaMA TIER_1 English(EN) · /u/fragment_me · 2026-06-14 00:21

使用 Q4_0 量化将索引存储在秤上而不是秤本身，秤的大小减小了约 31%（增益不大但很有趣）

<div class="md"><p>I've been having some fun looking at pre and post quant weights to try to identify some unique ideas on saving space or increasing accuracy.</p> <p>I was originally looking at duplicate weights to determine if there's potential for trading a bit …

报道来源 [1]

使用 Q4_0 量化将索引存储在秤上而不是秤本身，秤的大小减小了约 31%（增益不大但很有趣）

相关实体

相关话题