MARR: Module-Adaptive Residual Reconstruction for Low-Bit Post-Training Quantization
Researchers have developed a new technique called Module-Adaptive Residual Reconstruction (MARR) to improve low-bit post-training quantization for large language models and vision transformers. MARR addresses limitations in existing methods by adaptively balancing error correction and bias across different model modules. This approach uses a module-specific scaling coefficient and a PID-based update strategy to refine coefficients, leading to significant performance gains, particularly at quantization levels of 4-bit or lower. AI
IMPACT Enhances efficiency of LLMs and ViTs by improving low-bit quantization techniques.