Researchers have conducted a systematic study on transformer compression, analyzing over 40 experiments across GPT-2 and Mistral 7B models. Their findings indicate that variance in activation directions does not correlate with predictive importance, as projecting onto high-variance directions preserves most variance but degrades perplexity. The study also revealed that transformer blocks are only approximately linear under specific upstream distributions, and linearity generally increases with model depth. These insights suggest limitations for static post-training compression methods and highlight the potential of adaptive, per-token computation. AI
影响 Identifies fundamental limits to static post-training compression, suggesting adaptive, per-token computation as a more promising direction for model efficiency.
排序理由 This is a research paper detailing empirical findings on transformer compression techniques.
在 Hugging Face Daily Papers 阅读 →
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →