New 4/6 quantization method boosts LLM accuracy with adaptive scaling

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have developed a new quantization method called Four Over Six (4/6) to improve the accuracy of low-precision numerical formats like NVFP4 for large language models. This technique adaptively scales blocks to smaller FP4 values, reducing quantization error, particularly for near-maximal values. Experiments with the Nemotron 3 Nano 30B-A3B model architecture showed that 4/6 brings training loss closer to BF16 compared to existing NVFP4 methods, with minimal computational overhead. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Improves efficiency of LLMs by reducing memory usage and increasing speed with minimal accuracy loss.

RANK_REASON Academic paper detailing a new method for model quantization. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

paper
infra

COVERAGE [1]

arXiv cs.LG TIER_1 · Jack Cook, Junxian Guo, Guangxuan Xiao, Yujun Lin, Song Han · 2026-05-08 04:00

Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling

arXiv:2512.02010v4 Announce Type: replace-cross Abstract: As large language models have grown larger, interest has grown in low-precision numerical formats such as NVFP4 as a way to improve speed and reduce memory usage. However, quantizing models to NVFP4 remains challenging as …

COVERAGE [1]

Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling

RELATED ENTITIES

RELATED TOPICS