PulseAugur
EN
LIVE 09:13:13

New ReQAT framework enables 4-bit quantized LLMs to match full-precision reasoning

Researchers have developed ReQAT, a novel training framework designed to enable Large Reasoning Models (LRMs) to achieve full-precision reasoning accuracy even when quantized to 4-bit floating-point formats. Existing quantization methods struggle with low-entropy tokens like digits and operators, leading to reasoning degradation. ReQAT addresses this through Trace-Aligned QAT, Selective Entropy Minimization, and Q-FIT initialization, which collectively focus on critical decisions and stabilize training. This approach not only recovers but surpasses standard fine-tuning accuracy while significantly improving inference speed and reducing hardware requirements. AI

IMPACT Enables more efficient deployment of large reasoning models, potentially reducing hardware costs and increasing inference speeds.

RANK_REASON This is a research paper detailing a new method for quantizing large language models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.LG TIER_1 English(EN) · Janghwan Lee, Sihwa Lee, Jinseok Kim, Yongjik Kim, Jieun Lim, Jinwook Oh, Jungwook Choi ·

    ReQAT: Achieving Full-Precision Reasoning Accuracy with 4-bit Floating-Point Quantization-Aware Training

    arXiv:2606.15682v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) achieve strong problem-solving through long chain-of-thought, but their deployment is constrained by the high cost of full-precision inference and growing KV cache footprints. Microscaled FP4 formats en…