PulseAugur
实时 04:56:29

New 4/6 quantization method boosts LLM accuracy with adaptive scaling

Researchers have developed a new quantization method called Four Over Six (4/6) to improve the accuracy of low-precision numerical formats like NVFP4 for large language models. This technique adaptively scales blocks to smaller FP4 values, reducing quantization error, particularly for near-maximal values. Experiments with the Nemotron 3 Nano 30B-A3B model architecture showed that 4/6 brings training loss closer to BF16 compared to existing NVFP4 methods, with minimal computational overhead. AI

影响 Improves efficiency of LLMs by reducing memory usage and increasing speed with minimal accuracy loss.

排序理由 Academic paper detailing a new method for model quantization. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

New 4/6 quantization method boosts LLM accuracy with adaptive scaling

报道来源 [1]

  1. arXiv cs.LG TIER_1 English(EN) · Jack Cook, Junxian Guo, Guangxuan Xiao, Yujun Lin, Song Han ·

    Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling

    arXiv:2512.02010v4 Announce Type: replace-cross Abstract: As large language models have grown larger, interest has grown in low-precision numerical formats such as NVFP4 as a way to improve speed and reduce memory usage. However, quantizing models to NVFP4 remains challenging as …