Researchers have introduced CoQuant, a novel method for mixed-precision quantization in Large Language Models (LLMs). This technique addresses limitations in existing approaches by jointly considering both weight and activation statistics to identify critical subspaces for high-precision preservation. CoQuant utilizes a theoretically modeled error and a weighted PCA solution to balance these covariances, aiming to reduce inference costs more effectively. Experiments on Llama-3.2 and Qwen2.5 models demonstrate CoQuant's superior performance in perplexity and reasoning accuracy compared to current post-training quantization baselines. AI
影响 Improves LLM efficiency by reducing inference costs through optimized mixed-precision quantization.
排序理由 The cluster contains an academic paper detailing a new method for LLM quantization.
AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →