Brief · PulseAugur

TOOL · r/LocalLLaMA English(EN) · 3h

Here is my llama.cpp NVFP4/MXFP6 GGUF quantizer tool

A developer has released an advanced quantizer tool for llama.cpp, designed to create NVFP4 and MXFP6 GGUF models. This tool goes beyond basic quantization by evaluating various methods and incorporating custom techniques like RSF (Refined Scale Fitting) to optimize model performance. It scores layers individually using metrics like perplexity and KLD, while conservatively handling sensitive tensors and promoting them to higher precision when justified. The project also includes a new MXFP6 CUDA implementation for NVIDIA's Blackwell architecture. AI

IMPACT Enables more efficient local LLM deployment by improving quantization techniques for various model formats.

Hugging Face
NVIDIA
llama.cpp
GGUF
NVFP4
Blackwell
MXFP6
advanced-quantizer-tool
Qwopus3.6-27B-v2-MTP-NVFP4-GGUF
Qwen3.6-27B-NVFP4-MTP-GGUF
michaelw9999
ModelOpt