PulseAugur
实时 13:49:14
English(EN) Cross-Layer Subspace Coupling for LLM Compression: A Unifying Framework and Its Empirical Limits

新研究探索LLM压缩,关注不确定性和子模块优化

研究人员正在探索压缩大型语言模型(LLM)的新方法,同时保持其性能和不确定性量化能力。一项研究介绍了SubFit,它在子模块层面压缩LLM,实现了比现有方法更好的准确率-困惑度权衡。另一篇论文ProjQ将量化噪声约束在低秩结构中,改进了适配器感知压缩。第三篇论文研究了压缩技术是否会影响LLM量化其不确定性的能力,发现更大的模型更能适应压缩,并且仅凭准确率不足以满足部署要求。最后,提出了一个基于SVD的压缩统一框架,但它强调了权重空间重建是一个有缺陷的目标,建议未来跨层压缩转向激活重建。 AI

影响 这些研究论文引入了先进的LLM压缩技术,有望在实际应用中实现更高效的部署和改进的性能。

排序理由 多篇在arXiv上发表的学术论文,详细介绍了LLM压缩的新方法和分析。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。 我们如何撰写摘要 →

报道来源 [5]

  1. arXiv cs.AI TIER_1 English(EN) · Yujia Tong, Yuxi Wang, Yunyang Wan, Tian Zhang, Junhao Dong, Jingling Yuan ·

    Does Compression Preserve Uncertainty? A Unified Benchmark for Quantized and Sparse LLMs via Conformal Prediction

    arXiv:2606.01850v1 Announce Type: new Abstract: Model compression techniques such as quantization and pruning are widely used to reduce the deployment cost of large language models (LLMs), with existing evaluations focusing almost exclusively on accuracy preservation. However, in…

  2. arXiv cs.AI TIER_1 English(EN) · Elia Cunegatti, Marcus Vukojevic, Erik Nielsen, Giovanni Iacca ·

    From Layers to Submodules: Rethinking Granularity in Replacement-Based LLM Compression

    arXiv:2606.02559v1 Announce Type: cross Abstract: Post-training compression of Large Language Models (LLMs) removes entire architectural components, either deleting them or replacing them with fitted modules. Existing replacement-based methods share two design constraints: full-l…

  3. arXiv cs.LG TIER_1 English(EN) · Wneya Yu, Chao Zhang, Li Wang, Samson Lasaulce, Merouane Debbah ·

    ProjQ: Project-and-Quantize for Adapter-Aware LLM Compression

    arXiv:2606.00494v1 Announce Type: new Abstract: Post-Training Quantization (PTQ) and Low-Rank Adaptation (LoRA) constitute the standard pipeline for efficient Large Language Model (LLM) deployment. However, applying them sequentially poses a problem: PTQ often leaves behind rando…

  4. arXiv cs.AI TIER_1 English(EN) · Giovanni Iacca ·

    From Layers to Submodules: Rethinking Granularity in Replacement-Based LLM Compression

    Post-training compression of Large Language Models (LLMs) removes entire architectural components, either deleting them or replacing them with fitted modules. Existing replacement-based methods share two design constraints: full-layer granularity and contiguous selection. We argu…

  5. arXiv cs.LG TIER_1 English(EN) · Snigdha Chandan Khilar ·

    面向大语言模型压缩的跨层子空间耦合:统一框架及其经验极限

    arXiv:2605.30836v1 Announce Type: new Abstract: Recent SVD based compression methods for large language models like SVD LLM and Basis Sharing can be unified under one optimization problem. While mathematical proofs and tests on Pythia models show this unified approach improves we…