PulseAugur
EN
LIVE 02:51:58

New BCJR-QAT method pushes LLM quantization to 2 bits per weight

Researchers have developed BCJR-QAT, a novel method for quantizing large language models to 2 bits per weight, a significant advancement beyond current post-training quantization techniques. This new approach uses a differentiable relaxation of the Viterbi algorithm, enabling quantization-aware training and achieving better perplexity scores on benchmarks like WikiText-2. The method has been demonstrated to improve performance on models such as Llama-3.2-1B, outperforming existing methods by a notable margin. AI

IMPACT Enables more efficient LLM deployment by reducing model size and computational requirements.

RANK_REASON Publication of an academic paper detailing a new method for LLM quantization. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New BCJR-QAT method pushes LLM quantization to 2 bits per weight

COVERAGE [1]

  1. arXiv cs.LG TIER_1 English(EN) · Venugopalan Iyengar ·

    BCJR-QAT: A Differentiable Relaxation of Trellis-Coded Weight Quantization

    Trellis-coded quantization sets the current 2-bit post-training frontier for LLMs (QTIP), but pushing below the PTQ ceiling requires quantization-aware training, and QAT on a trellis is obstructed by the non-differentiable Viterbi argmax. We introduce BCJR-QAT, a relaxation that …