PulseAugur
EN
LIVE 09:05:56

OpenPangu LLM quantization on Ascend NPUs shows 8-bit is lossless, 4-bit degrades 1B model

A new study investigates the effectiveness of various post-training quantization methods for the OpenPangu large language models when deployed on Ascend NPUs. Researchers found that 8-bit weight-only quantization is nearly lossless for both 1B and 7B parameter models. However, 4-bit quantization shows a more significant performance degradation on the 1B model, particularly in reasoning and coding tasks, while remaining practical for the 7B model. The study also highlights the challenges of ultra-low precision quantization, with most 2-bit and binary settings resulting in near-random performance. AI

IMPACT Provides an NPU-oriented accuracy map for selecting OpenPangu quantization settings, aiding efficient domestic LLM deployment.

RANK_REASON The cluster contains an academic paper detailing empirical research on model quantization techniques. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

OpenPangu LLM quantization on Ascend NPUs shows 8-bit is lossless, 4-bit degrades 1B model

COVERAGE [1]

  1. arXiv cs.AI TIER_1 English(EN) · Tong Shi, Jiacheng Wang, Hui Xie, Ying Li, Aishan Liu, Jinyang Guo, Xianglong Liu ·

    An Empirical Study of OpenPangu Quantization on Ascend NPUs

    arXiv:2606.21257v2 Announce Type: replace-cross Abstract: OpenPangu models are attractive targets for private and domestic large-language-model deployment, yet their robustness under aggressive post-training quantization on Ascend NPUs has not been systematically characterized. T…