New BCJR-QAT method pushes LLM quantization to 2 bits per weight

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have developed BCJR-QAT, a novel method for quantizing large language models to 2 bits per weight, a significant advancement beyond current post-training quantization techniques. This new approach uses a differentiable relaxation of the Viterbi algorithm, enabling quantization-aware training and achieving better perplexity scores on benchmarks like WikiText-2. The method has been demonstrated to improve performance on models such as Llama-3.2-1B, outperforming existing methods by a notable margin. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Enables more efficient LLM deployment by reducing model size and computational requirements.

RANK_REASON Publication of an academic paper detailing a new method for LLM quantization. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

COVERAGE [1]

arXiv cs.LG TIER_1 · Venugopalan Iyengar · 2026-05-11 14:40

BCJR-QAT: A Differentiable Relaxation of Trellis-Coded Weight Quantization

Trellis-coded quantization sets the current 2-bit post-training frontier for LLMs (QTIP), but pushing below the PTQ ceiling requires quantization-aware training, and QAT on a trellis is obstructed by the non-differentiable Viterbi argmax. We introduce BCJR-QAT, a relaxation that …

COVERAGE [1]

BCJR-QAT: A Differentiable Relaxation of Trellis-Coded Weight Quantization

RELATED ENTITIES

RELATED TOPICS