PulseAugur
实时 09:35:33

New method slashes LLM quantization bit-width with spectral rotations

Researchers have developed a novel method called BBT-spectral for quantizing large language models (LLMs) to extremely low bit-widths, specifically W2A16 (2-bit weights, 16-bit activations). This technique utilizes influence-inspired spectral rotations and a reconstruction-error quantizer to significantly reduce perplexity, outperforming vanilla auto-round quantization by 15-58% on various model sizes. The method has been extended to address specific architectural challenges in models like Qwen3 and Qwen2.5, demonstrating its adaptability and effectiveness across different LLM families. AI

影响 This research could enable more efficient deployment of LLMs on resource-constrained hardware by significantly reducing their memory footprint.

排序理由 The cluster contains an academic paper detailing a new method for LLM quantization. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

报道来源 [1]

  1. arXiv cs.AI TIER_1 English(EN) · Gorgi Pavlov ·

    Influence-Inspired Spectral Rotations for Extreme Low-Bit LLM Quantization

    arXiv:2605.25203v1 Announce Type: cross Abstract: We apply the influence-adaptive Walsh geometry of a companion theory paper (arXiv:2605.01637) to extreme low-bit weight-only LLM quantization. The recipe is one math-invariant transformation: WHT-rotate each linear layer's weight …