New 1.58-bit LLM family achieves 6x inference memory reduction

By PulseAugur Editorial · [1 sources] · 2026-05-15 13:10

A new family of large language models, BitCPM-CANN, has been developed using a novel 1.58-bit ternary quantization technique. These models, ranging from 0.5B to 8B parameters, achieve significant memory reduction for inference, approximately six times less than their full-precision counterparts. The training process, conducted on Huawei Ascend NPUs, introduced minimal overhead, with only a 5% degradation in throughput. AI

IMPACT Enables more efficient LLM deployment with significantly reduced memory footprints.

RANK_REASON Research paper detailing a new quantization technique and resulting models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on Hugging Face Trending Models →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New 1.58-bit LLM family achieves 6x inference memory reduction

COVERAGE [1]

Hugging Face Trending Models TIER_1 Nederlands(NL) · openbmb · 2026-05-15 13:10

openbmb/BitCPM-CANN-8B

text-generation · 1,202 downloads · 63 likes

COVERAGE [1]

openbmb/BitCPM-CANN-8B

RELATED TOPICS