Researchers have introduced SOAR, a new post-training quantization framework designed to enhance the accuracy of NVFP4 quantization for large language models. SOAR employs Closed-form Joint Scale Optimization (CJSO) to jointly optimize global and block-wise scales by minimizing reconstruction error. It also utilizes Decoupled Scale Search (DSS) to separate quantization and dequantization scales, improving precision. Experiments demonstrate that SOAR achieves superior accuracy compared to existing NVFP4 methods without increasing memory footprint or requiring new hardware. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Improves LLM efficiency and accuracy by optimizing quantization, potentially reducing computational costs and memory requirements.
RANK_REASON Publication of an academic paper detailing a new technical framework for model quantization. [lever_c_demoted from research: ic=1 ai=1.0]