Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 19h · [2 sources]

Rethinking Shrinkage Bias in LLM FP4 Pretraining: Geometric Origin, Systemic Impact, and UFP4 Recipe

A new research paper introduces UFP4, a uniform 4-bit training recipe designed to address shrinkage bias in large language model pretraining. The study identifies that current non-uniform FP4 formats, like E2M1 used in NVIDIA Blackwell/Rubin and AMD MI350 GPUs, introduce systematic rounding errors. UFP4, by contrast, utilizes uniform grids (E1M2/INT4) to improve quantization quality and demonstrates lower loss degradation on various model sizes compared to existing E2M1-based methods. AI

IMPACT This research could lead to more efficient and stable training of large language models by improving quantization techniques.

LLM
NVIDIA Blackwell
INT4
UFP4
MoE 124B
MoE 7.9B
AMD MI350
E2M1
E1M2
Dense 1.5B
Random Hadamard Transform