SmoothQuant
PulseAugur coverage of SmoothQuant — every cluster mentioning SmoothQuant across labs, papers, and developer communities, ranked by signal.
1 day(s) with sentiment data
-
OpenPangu LLM quantization on Ascend NPUs shows 8-bit is lossless, 4-bit degrades 1B model
A new study investigates the effectiveness of various post-training quantization methods for the OpenPangu large language models when deployed on Ascend NPUs. Researchers found that 8-bit weight-only quantization is nea…
-
llmcompressor tool enables LLM compression via FP8, GPTQ, SmoothQuant
A new open-source tool named llmcompressor allows developers to compress and benchmark instruction-tuned large language models. The tool demonstrates how to apply post-training quantization techniques such as FP8, GPTQ,…
-
Optimizing Transformer Inference: Techniques for Faster, Cheaper Large Models
Large transformer models present significant inference challenges due to their substantial memory footprint and computation costs, which scale quadratically with input length. Researchers and practitioners are exploring…