Researchers have developed TORQ, a new framework for quantizing Large Language Models (LLMs) using the MXFP4 format. This method addresses accuracy degradation issues by analyzing and correcting imbalances in activation quantization. TORQ employs a two-level orthogonal rotation strategy to optimize the activation space, significantly improving LLM accuracy with 4-bit floating-point quantization. AI
影响 Improves LLM efficiency and accuracy by enabling better low-bit quantization, potentially reducing inference costs.
排序理由 The cluster contains a research paper detailing a new method for LLM quantization. [lever_c_demoted from research: ic=1 ai=1.0]
在 Hugging Face Daily Papers 阅读 →
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →