Brief · PulseAugur

TOOL · arXiv cs.LG English(EN) · 7h

Beyond Output Matching: Preserving Internal Geometry in NVFP4 LLM Distillatio

Researchers have developed a new method called CKA-QAD to improve the accuracy of low-precision large language models (LLMs). Traditional methods like quantization-aware distillation (QAD) focus on matching output distributions, but this can mask internal degradation in the model's representations. The new approach uses Canonical Correlation Analysis (CKA) to preserve the internal geometry of LLMs during distillation, leading to better performance on reasoning and coding tasks. This method has shown significant improvements across models like Nemotron 3 Nano and Qwen3-4B-Thinking-2507 with minimal additional training. AI

IMPACT Preserves internal LLM geometry during distillation, improving accuracy for low-precision models on complex tasks.

Nemotron 3 Nano
NVFP4
Qwen3-4B-Thinking-2507
CKA-QAD