PulseAugur
EN
LIVE 08:03:26

New UFP4 recipe tackles shrinkage bias in LLM FP4 pretraining

A new research paper introduces UFP4, a uniform 4-bit training recipe designed to overcome shrinkage bias in large language model pretraining. This bias, stemming from non-uniform FP4 formats like E2M1, leads to systematic rounding errors that accumulate across model layers and are amplified by the Random Hadamard Transform (RHT). UFP4 utilizes uniform grids (E1M2/INT4) to bypass this error, achieving better quantization quality and lower loss degradation on various model sizes compared to existing E2M1-based methods. The findings suggest that future hardware should prioritize uniform 4-bit grids as primary training primitives. AI

IMPACT This research could lead to more efficient LLM pretraining by reducing memory and computation costs through improved 4-bit quantization techniques.

RANK_REASON Research paper detailing a new method for LLM pretraining. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New UFP4 recipe tackles shrinkage bias in LLM FP4 pretraining

COVERAGE [2]

  1. arXiv cs.AI TIER_1 English(EN) · Qian Zhao, Kunlong Chen, Changxin Tian, Zhonghui Jiang, Haitao Zhang, Chaofan Yu, Peijie Jiang, Mingliang Gong, Jia Liu, Ziqi Liu, Zhiqiang Zhang, Jun Zhou ·

    Rethinking Shrinkage Bias in LLM FP4 Pretraining: Geometric Origin, Systemic Impact, and UFP4 Recipe

    arXiv:2606.20381v1 Announce Type: new Abstract: FP4 training promises substantial reductions in memory and computation cost for LLM pretraining, yet current FP4 hardware paths and recipes, including NVIDIA Blackwell/Rubin-class systems and AMD MI350-series GPUs, remain centered o…

  2. arXiv cs.AI TIER_1 English(EN) · Jun Zhou ·

    Rethinking Shrinkage Bias in LLM FP4 Pretraining: Geometric Origin, Systemic Impact, and UFP4 Recipe

    FP4 training promises substantial reductions in memory and computation cost for LLM pretraining, yet current FP4 hardware paths and recipes, including NVIDIA Blackwell/Rubin-class systems and AMD MI350-series GPUs, remain centered on E2M1 data elements. In this study, we identify…