NVIDIA unveils 4-bit pretraining method, NVFP4, for LLMs

By PulseAugur Editorial · [3 sources] · 2026-05-18 08:42

NVIDIA has developed a new 4-bit pretraining methodology called NVFP4, designed to overcome the challenges of reduced dynamic range and increased quantization error in narrower floating-point formats. This method was successfully validated by pretraining a 12-billion-parameter hybrid Mamba-Transformer model on 10 trillion tokens, marking the longest publicly documented training run in 4-bit precision to date. The resulting model achieved performance nearly identical to an FP8 baseline on the MMLU-Pro benchmark, demonstrating the viability of NVFP4 for large-scale model training. AI

IMPACT Enables more efficient training of large language models by reducing precision requirements without significant performance loss.

RANK_REASON The cluster describes a new pretraining methodology and its validation on a large model, presented as a research finding.

Read on MarkTechPost →

infra
paper

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

NVIDIA unveils 4-bit pretraining method, NVFP4, for LLMs

COVERAGE [3]

MarkTechPost TIER_1 English(EN) · Asif Razzaq · 2026-05-18 08:42

NVIDIA Introduces a 4-Bit Pretraining Methodology Using NVFP4, Validated on a 12B Hybrid Mamba-Transformer at 10T Token Horizon

<p>NVIDIA introduces a 4-bit pretraining methodology built around the NVFP4 microscaling format — combining selective BF16 layers, 16×16 Random Hadamard Transforms on Wgrad inputs, 2D weight scaling, and stochastic rounding on gradients — validated on a 12B hybrid Mamba-Transform…
Mastodon — sigmoid.social TIER_1 English(EN) · [email protected] · 2026-05-18 12:52

NVIDIA has introduced a 4-bit pretraining methodology using NVFP4, validated on a 12 billion parameter hybrid Mamba-Transformer trained on 10 trillion tokens -

NVIDIA has introduced a 4-bit pretraining methodology using NVFP4, validated on a 12 billion parameter hybrid Mamba-Transformer trained on 10 trillion tokens - the longest publicly documented 4-bit pretraining run. Accuracy closely matches the FP8 baseline at 62.58% versus 62.62%…

LINKS marktechpost.com/…/nvidia-introduces-a-4-…
Mastodon — fosstodon.org TIER_1 Polski(PL) · [email protected] · 2026-05-19 11:17

NVIDIA has proven that training models on 10 trillion tokens in 4-bit NVFP4 precision does not cause a drop in quality. This is the foundation for the Blackwell architecture

NVIDIA udowodniła, że trenowanie modeli na 10 bilionach tokenów w 4-bitowej precyzji NVFP4 nie powoduje spadku jakości. To fundament pod architekturę Blackwell i szansa na radykalne obniżenie kosztów szkolenia AI. # si # ai # sztucznainteligencja # wiadomości # informacje # techn…

LINKS aisight.pl/…/nvidia-4-bity-trenowanie-gig…

COVERAGE [3]

NVIDIA Introduces a 4-Bit Pretraining Methodology Using NVFP4, Validated on a 12B Hybrid Mamba-Transformer at 10T Token Horizon

NVIDIA has introduced a 4-bit pretraining methodology using NVFP4, validated on a 12 billion parameter hybrid Mamba-Transformer trained on 10 trillion tokens -

NVIDIA has proven that training models on 10 trillion tokens in 4-bit NVFP4 precision does not cause a drop in quality. This is the foundation for the Blackwell architecture

RELATED ENTITIES

RELATED TOPICS