A new study investigates the effectiveness of various post-training quantization methods for the OpenPangu large language models when deployed on Ascend NPUs. Researchers found that 8-bit weight-only quantization is nearly lossless for both 1B and 7B parameter models. However, 4-bit quantization shows a more significant performance degradation on the 1B model, particularly in reasoning and coding tasks, while remaining practical for the 7B model. The study also highlights the challenges of ultra-low precision quantization, with most 2-bit and binary settings resulting in near-random performance. AI
IMPACT Provides an NPU-oriented accuracy map for selecting OpenPangu quantization settings, aiding efficient domestic LLM deployment.
RANK_REASON The cluster contains an academic paper detailing empirical research on model quantization techniques. [lever_c_demoted from research: ic=1 ai=1.0]
- Activation Aware Quantization
- Ascend NPUs
- GPTAQ
- GPTQ
- Huawei Ascend 910B1
- openPangu
- SliM-LLM
- SmoothQuant
- Tong Shi
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →