PulseAugur
EN
LIVE 13:53:24

New 1.58-bit LLM family achieves 6x inference memory reduction

A new family of large language models, BitCPM-CANN, has been developed using a novel 1.58-bit ternary quantization technique. These models, ranging from 0.5B to 8B parameters, achieve significant memory reduction for inference, approximately six times less than their full-precision counterparts. The training process, conducted on Huawei Ascend NPUs, introduced minimal overhead, with only a 5% degradation in throughput. AI

IMPACT Enables more efficient LLM deployment with significantly reduced memory footprints.

RANK_REASON Research paper detailing a new quantization technique and resulting models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on Hugging Face Trending Models →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New 1.58-bit LLM family achieves 6x inference memory reduction

COVERAGE [1]

  1. Hugging Face Trending Models TIER_1 Nederlands(NL) · openbmb ·

    openbmb/BitCPM-CANN-8B

    text-generation · 1,202 downloads · 63 likes