NVIDIA has developed a new 4-bit pretraining methodology called NVFP4, designed to overcome the challenges of reduced dynamic range and increased quantization error in narrower floating-point formats. This method was successfully validated by pretraining a 12-billion-parameter hybrid Mamba-Transformer model on 10 trillion tokens, marking the longest publicly documented training run in 4-bit precision to date. The resulting model achieved performance nearly identical to an FP8 baseline on the MMLU-Pro benchmark, demonstrating the viability of NVFP4 for large-scale model training. AI
IMPACT Enables more efficient training of large language models by reducing precision requirements without significant performance loss.
RANK_REASON The cluster describes a new pretraining methodology and its validation on a large model, presented as a research finding.
AI-generated summary · Google Gemini · from 3 sources. How we write summaries →