Brief · PulseAugur

TOOL · arXiv cs.LG English(EN) · 7h

ReQAT: Achieving Full-Precision Reasoning Accuracy with 4-bit Floating-Point Quantization-Aware Training

Researchers have developed ReQAT, a novel training framework designed to enable Large Reasoning Models (LRMs) to achieve full-precision reasoning accuracy even when quantized to 4-bit floating-point formats. Existing quantization methods struggle with low-entropy tokens like digits and operators, leading to reasoning degradation. ReQAT addresses this through Trace-Aligned QAT, Selective Entropy Minimization, and Q-FIT initialization, which collectively focus on critical decisions and stabilize training. This approach not only recovers but surpasses standard fine-tuning accuracy while significantly improving inference speed and reducing hardware requirements. AI

IMPACT Enables more efficient deployment of large reasoning models, potentially reducing hardware costs and increasing inference speeds.

Hugging Face
Nvidia B200
Large Reasoning Models
bfloat16
NVIDIA DGX Spark
ReQAT
4-bit Floating-Point Quantization-Aware Training