Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 14h

Influence-Inspired Spectral Rotations for Extreme Low-Bit LLM Quantization

Researchers have developed a novel method called BBT-spectral for quantizing large language models (LLMs) to extremely low bit-widths, specifically W2A16 (2-bit weights, 16-bit activations). This technique utilizes influence-inspired spectral rotations and a reconstruction-error quantizer to significantly reduce perplexity, outperforming vanilla auto-round quantization by 15-58% on various model sizes. The method has been extended to address specific architectural challenges in models like Qwen3 and Qwen2.5, demonstrating its adaptability and effectiveness across different LLM families. AI

IMPACT This research could enable more efficient deployment of LLMs on resource-constrained hardware by significantly reducing their memory footprint.

Intel
OpenVINO
LLM
Qwen3
Qwen2.5
ButterflyQuant
OmniQuant
BBT-spectral
AQLM
QuaRot
W2A16
QuIP-sharp