Researchers find variance doesn't equal importance in transformer compression

作者 PulseAugur 编辑部 · [1 个来源] · 2026-04-22 15:31

Researchers have conducted a systematic study on transformer compression, analyzing over 40 experiments across GPT-2 and Mistral 7B models. Their findings indicate that variance in activation directions does not correlate with predictive importance, as projecting onto high-variance directions preserves most variance but degrades perplexity. The study also revealed that transformer blocks are only approximately linear under specific upstream distributions, and linearity generally increases with model depth. These insights suggest limitations for static post-training compression methods and highlight the potential of adaptive, per-token computation. AI

影响 Identifies fundamental limits to static post-training compression, suggesting adaptive, per-token computation as a more promising direction for model efficiency.

排序理由 This is a research paper detailing empirical findings on transformer compression techniques.

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

Researchers find variance doesn't equal importance in transformer compression

报道来源 [1]

Hugging Face Daily Papers TIER_1 English(EN) · 2026-04-22 15:31

Variance Is Not Importance: Structural Analysis of Transformer Compressibility Across Model Scales

We present a systematic empirical study of transformer compression through over 40 experiments on GPT-2 (124M parameters) and Mistral 7B (7.24B parameters). Our analysis covers spectral compression, block-level function replacement, rotation-based quantization, activation geometr…

报道来源 [1]

Variance Is Not Importance: Structural Analysis of Transformer Compressibility Across Model Scales

相关实体

相关话题