PulseAugur
EN
LIVE 18:48:42

NVIDIA unveils 4-bit pretraining method, NVFP4, for LLMs

NVIDIA has developed a new 4-bit pretraining methodology called NVFP4, designed to overcome the challenges of reduced dynamic range and increased quantization error in narrower floating-point formats. This method was successfully validated by pretraining a 12-billion-parameter hybrid Mamba-Transformer model on 10 trillion tokens, marking the longest publicly documented training run in 4-bit precision to date. The resulting model achieved performance nearly identical to an FP8 baseline on the MMLU-Pro benchmark, demonstrating the viability of NVFP4 for large-scale model training. AI

IMPACT Enables more efficient training of large language models by reducing precision requirements without significant performance loss.

RANK_REASON The cluster describes a new pretraining methodology and its validation on a large model, presented as a research finding.

Read on MarkTechPost →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

NVIDIA unveils 4-bit pretraining method, NVFP4, for LLMs

COVERAGE [3]

  1. MarkTechPost TIER_1 English(EN) · Asif Razzaq ·

    NVIDIA Introduces a 4-Bit Pretraining Methodology Using NVFP4, Validated on a 12B Hybrid Mamba-Transformer at 10T Token Horizon

    <p>NVIDIA introduces a 4-bit pretraining methodology built around the NVFP4 microscaling format — combining selective BF16 layers, 16×16 Random Hadamard Transforms on Wgrad inputs, 2D weight scaling, and stochastic rounding on gradients — validated on a 12B hybrid Mamba-Transform…

  2. Mastodon — sigmoid.social TIER_1 English(EN) · [email protected] ·

    NVIDIA has introduced a 4-bit pretraining methodology using NVFP4, validated on a 12 billion parameter hybrid Mamba-Transformer trained on 10 trillion tokens -

    NVIDIA has introduced a 4-bit pretraining methodology using NVFP4, validated on a 12 billion parameter hybrid Mamba-Transformer trained on 10 trillion tokens - the longest publicly documented 4-bit pretraining run. Accuracy closely matches the FP8 baseline at 62.58% versus 62.62%…

  3. Mastodon — fosstodon.org TIER_1 Polski(PL) · [email protected] ·

    NVIDIA has proven that training models on 10 trillion tokens in 4-bit NVFP4 precision does not cause a drop in quality. This is the foundation for the Blackwell architecture

    NVIDIA udowodniła, że trenowanie modeli na 10 bilionach tokenów w 4-bitowej precyzji NVFP4 nie powoduje spadku jakości. To fundament pod architekturę Blackwell i szansa na radykalne obniżenie kosztów szkolenia AI. # si # ai # sztucznainteligencja # wiadomości # informacje # techn…