NVFP4
PulseAugur coverage of NVFP4 — every cluster mentioning NVFP4 across labs, papers, and developer communities, ranked by signal.
2 day(s) with sentiment data
-
SOAR framework boosts LLM accuracy with novel NVFP4 quantization
Researchers have introduced SOAR, a new post-training quantization framework designed to enhance the accuracy of NVFP4 quantization for large language models. SOAR employs Closed-form Joint Scale Optimization (CJSO) to …
-
New number formats boost AI direction preservation
Researchers have developed a new geometric framework to analyze how well low-precision number formats in machine learning preserve vector direction. The study analytically quantifies the suboptimality of standard format…
-
New 4/6 quantization method boosts LLM accuracy with adaptive scaling
Researchers have developed a new quantization method called Four Over Six (4/6) to improve the accuracy of low-precision numerical formats like NVFP4 for large language models. This technique adaptively scales blocks to…
-
Qwen3.6-35B model quantizations show FP8 quality worse than INT8, NVFP4 is a lie
A user on Reddit's LocalLLaMA community shared findings on the Qwen3.6-35B model, focusing on Kullback-Leibler (KLD) divergence metrics for different quantization formats like INT8, FP8, and NVFP4. The analysis, conduct…
-
llama.cpp and ik_llama.cpp add FP4 inference support for VRAM savings
The llama.cpp and ik_llama.cpp projects have both integrated support for FP4 (4-bit floating-point) inference, a significant advancement for model quantization. llama.cpp now includes NVFP4, an Nvidia-specific format, w…