Brief · PulseAugur

RESEARCH · arXiv cs.LG English(EN) · 22h · [2 sources]

Massive Spikes in LLMs are Bias Vectors: Mechanistic Uncovering and Spike-Free Quantization

Researchers have identified that massive activation spikes in Large Language Models (LLMs) are not simple scalar biases but rather structural vector biases within specific tokens. These vectors are preserved by the model's projection weights and positional embeddings, even against perturbations. To address the degradation these spikes cause in quantization, a new post-training quantization framework called INSERTQUANT has been developed. This method clamps spikes and restores their function, enabling robust low-bit quantization with high fidelity across modalities. AI

IMPACT Enables more efficient low-bit quantization of LLMs, potentially reducing computational costs and memory requirements for deployment.

LLMs
ViTs
INSERTQUANT
Large Language Models