English(EN) GlowQ: Group-Shared LOw-Rank Approximation for Quantized LLMs

新方法解决大语言模型量化问题，以提高效率和准确性

作者 PulseAugur 编辑部 · [8 个来源] · 2026-04-30 18:55

研究人员开发了多种通过量化提高大语言模型（LLM）效率的新方法。OSAQ 专注于利用低秩 Hessian 属性抑制权重异常值，实现精确的低比特仅权重量化。BWLA 引入了一个框架，用于 1 位权重量化和低比特激活，实现了显著的推理加速。AGoQ 通过采用感知层激活量化和 8 位梯度存储，以内存高效的方式进行分布式训练，减少了内存使用并提高了训练速度。 AI

影响大语言模型量化方面的这些进展有望显著降低计算成本和内存需求，从而实现更大模型的广泛部署和更快的推理。

排序理由多篇 arXiv 论文介绍了用于大语言模型量化的新技术，重点关注效率和准确性改进。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 8 个来源。我们如何撰写摘要 →

报道来源 [8]

arXiv cs.LG TIER_1 English(EN) · Zhikai Li, Zhen Dong, Xuewen Liu, Jing Zhang, Qingyi Gu · 2026-05-07 04:00

OSAQ：用于精确低比特大模型量化的离群自吸收

arXiv:2605.04738v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities. However, their massive parameter scale leads to significant resource consumption and latency during inference. Post-training weight-only quantization offers a p…
arXiv cs.LG TIER_1 English(EN) · Zhixiong Zhao, Zukang Xu, Dawei Yang · 2026-05-04 04:00

BWLA：打破LLM W1AX训练后量化的瓶颈

arXiv:2605.00422v1 Announce Type: new Abstract: Large language models (LLMs) have driven major progress in NLP, yet their substantial memory and compute demands still hinder practical deployment. Binarization can compress weights to 1 bit, fundamentally lowering compute and bandw…
arXiv cs.CL TIER_1 English(EN) · Wenxiang Lin, Juntao Huang, Luhan Zhang, Laili Li, Xiang Bao, Mengyang Zhang, Bing Wang, Shaohuai Shi · 2026-05-04 04:00

AGoQ：LLM内存高效分布式训练的激活与梯度量化

arXiv:2605.00539v1 Announce Type: new Abstract: Quantization is a key method for reducing the GPU memory requirement of training large language models (LLMs). Yet, current approaches are ineffective for 4-bit activations and 8-bit gradients, which would easily cause slow converge…
arXiv cs.CL TIER_1 English(EN) · Shaohuai Shi · 2026-05-01 09:39

AGoQ：LLM内存高效分布式训练的激活与梯度量化

Quantization is a key method for reducing the GPU memory requirement of training large language models (LLMs). Yet, current approaches are ineffective for 4-bit activations and 8-bit gradients, which would easily cause slow convergence or accuracy loss. To address this, we introd…
arXiv cs.AI TIER_1 English(EN) · Dawei Yang · 2026-05-01 05:42

BWLA：打破LLM W1AX训练后量化的瓶颈

Large language models (LLMs) have driven major progress in NLP, yet their substantial memory and compute demands still hinder practical deployment. Binarization can compress weights to 1 bit, fundamentally lowering compute and bandwidth cost. However, existing methods cannot addr…
arXiv cs.AI TIER_1 English(EN) · Selim An, Il hong Suh, Yeseong Kim · 2026-05-01 04:00

GlowQ：量化大语言模型的群组共享低秩近似

arXiv:2603.25385v2 Announce Type: replace-cross Abstract: Quantization techniques such as BitsAndBytes, AWQ, and GPTQ are widely used as a standard method in deploying large language models but often degrades accuracy when using low-bit representations, e.g., 4 bits. Low-rank cor…
arXiv cs.CV TIER_1 English(EN) · YiFeng Wang, Zhun Sun, Keisuke Sakaguchi · 2026-05-04 04:00

技术报告：用于低比特大模型量化的激活残差海森量化 (ARHQ)

arXiv:2605.00140v1 Announce Type: cross Abstract: We present Activation Residual Hessian Quantization (ARHQ), a post-training weight splitting method designed to mitigate error propagation in low-bit activation-weight quantization. By constructing an input-side residual Hessian f…
arXiv cs.CV TIER_1 English(EN) · Keisuke Sakaguchi · 2026-04-30 18:55

技术报告：用于低比特大模型量化的激活残差海森量化 (ARHQ)

We present Activation Residual Hessian Quantization (ARHQ), a post-training weight splitting method designed to mitigate error propagation in low-bit activation-weight quantization. By constructing an input-side residual Hessian from activation quantization residuals (G_x), ARHQ …

报道来源 [8]

相关实体

相关话题