新研究探索用于AI模型的先进压缩技术

作者 PulseAugur 编辑部 · [24 个来源] · 2026-06-01 04:00

研究人员正在探索压缩大型模型和数据集以提高效率的新颖方法。论文讨论了数据集剪枝和蒸馏的统一、图像生成的自举标记化以及用于LLM和VLM的激活感知低秩压缩。其他工作侧重于通用三潜在序列模型、不完美压缩下的预测的理论方面，以及LLM压缩的架构和量化选择的联合优化。 AI

影响压缩技术的进步可以显著降低部署成本并提高大型AI模型的可访问性。

排序理由多篇arXiv论文详细介绍了AI模型和数据压缩的新方法和理论分析。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 24 个来源。我们如何撰写摘要 →

报道来源 [24]

arXiv cs.AI TIER_1 English(EN) · Hoang-Loc La, Truong-Thanh Le, Amir Taherkordi, Phuong Hoai Ha · 2026-06-09 04:00

面向LLM压缩的联合结构化剪枝与混合精度量化

arXiv:2606.07819v1 Announce Type: new Abstract: Recently, the efficiency of Large Language Models (LLMs) deployment has become a critical concern in practical applications. While post-training quantization (PTQ) and structural pruning are established techniques for reducing memor…
arXiv cs.CL TIER_1 English(EN) · Ernests Lavrinovics, Marco Letizia, Roy Janco, Shai Segal, Johannes Bjerva, Maurizio Pierini · 2026-06-08 04:00

SigmaScale：基于SVD的低秩分解和学习缩放矩阵的LLM压缩

arXiv:2606.07098v1 Announce Type: new Abstract: We present SigmaScale, a method for learning auxiliary scaling matrices $S$ to aid truncated Singular Value Decomposition (SVD) based Large Language Model (LLM) compression. Instead of deriving scaling matrices analytically, SigmaSc…
arXiv cs.AI TIER_1 English(EN) · Liangji Zhu, Sanjay Ranka, Anand Rangarajan · 2026-06-06 04:00

用于科学数据高保真学习压缩的残差建模

arXiv:2606.05389v1 Announce Type: new Abstract: Lossy compression is essential for massive spatiotemporal data from scientific simulations. Learned compressors can achieve high compression ratios at moderate accuracy targets, but their aggregate reconstruction losses do not guara…
arXiv cs.AI TIER_1 English(EN) · Rui Wang, Yan Zhao, Li Song, Zhengxue Cheng · 2026-06-06 04:00

LLMCodec：为大型语言模型的高效权重压缩调整视频编解码器

arXiv:2606.05861v1 Announce Type: cross Abstract: The rapid development of large language models(LLMs) has led to remarkable advances in natural language processing. However, the increasing scale of these models introduces substantial challenges in terms of storage, transmission,…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-05 09:48

SigmaScale：基于SVD的低秩分解和学习缩放矩阵的LLM压缩

SigmaScale learns auxiliary scaling matrices to improve truncated SVD-based LLM compression by adapting to individual weight structures through activation-aware transformations.
arXiv cs.CL TIER_1 English(EN) · Maurizio Pierini · 2026-06-05 09:48

SigmaScale：基于SVD的低秩分解和学习缩放矩阵的LLM压缩

We present SigmaScale, a method for learning auxiliary scaling matrices $S$ to aid truncated Singular Value Decomposition (SVD) based Large Language Model (LLM) compression. Instead of deriving scaling matrices analytically, SigmaScale optimizes two sets of vectors that define di…
arXiv cs.CL TIER_1 English(EN) · Liu Xiao · 2026-06-05 04:00

具有门控联想检索的通用三潜在压缩

arXiv:2606.05175v1 Announce Type: new Abstract: We study generic triple-latent sequence models that maintain a running token state and compressed pair-memory pathway to capture higher-order token interactions without benchmark-specific parsing. The triple-latent family improves a…
arXiv cs.LG TIER_1 English(EN) · Lingao Xiao, Songhua Liu, Yang He, Xinchao Wang · 2026-06-05 04:00

统一数据集剪枝与蒸馏以实现高效大规模压缩

arXiv:2502.06434v2 Announce Type: replace-cross Abstract: Dataset pruning (DP) and dataset distillation (DD) fundamentally differ in their outputs: DP selects original image subsets, while DD generates synthetic images. Recently, DD's increasing reliance on original images sugges…
arXiv cs.LG TIER_1 English(EN) · Haozhe Chi, Jinghan Li, Hao Jiang, Wu Sheng, Yi Ma, Jing Wang, Yadong Mu · 2026-06-05 04:00

平衡图像压缩与生成：利用自举分词技术

arXiv:2606.05552v1 Announce Type: new Abstract: Despite progress in image tokenization, standard methods encode redundant information by mixing all granularities within each token, thus redundancy persists between tokens. The mix of information of different granularity also compl…
arXiv cs.CL TIER_1 English(EN) · Ryan Solgi, Parsa Madinei, Jiayi Tian, Rupak Swaminathan, Jing Liu, Nathan Susanj, Zheng Zhang · 2026-06-05 04:00

面向高效LLM/VLM的激活信息感知帕累托引导低秩压缩

arXiv:2510.05544v2 Announce Type: replace Abstract: Large language models (LLM) and vision-language models (VLM) have achieved state-of-the-art performance, but they impose significant memory and computing challenges in deployment. We present a novel low-rank compression framewor…
arXiv cs.AI TIER_1 English(EN) · Hoang-Loc La, Truong-Thanh Le, Amir Taherkordi, Phuong Hoai Ha · 2026-06-04 04:00

联合优化架构和量化选择的LLM压缩

arXiv:2606.04063v1 Announce Type: cross Abstract: Deploying large language models (LLMs) is challenging due to their significant memory and computational requirements. While some methods address this by developing small or tiny language models from scratch, these approaches deman…
arXiv cs.LG TIER_1 English(EN) · Qian Li, Xinyu Mao, Shang-Hua Teng, Guangxu Yang · 2026-06-04 04:00

不完美压缩下的预测：近似MDL理论

arXiv:2606.04834v1 Announce Type: new Abstract: Minimum Description Length (MDL) formalizes the principle of Occam's razor by optimizing the total description length: $L(\mathrm{model})+L(\mathrm{data} \ | \ \mathrm{model})$. For sequential prediction, the MDL method repeatedly s…
arXiv cs.LG TIER_1 English(EN) · Guangxu Yang · 2026-06-03 13:03

不完全压缩下的预测：近似MDL理论

Minimum Description Length (MDL) formalizes the principle of Occam's razor by optimizing the total description length: $L(\mathrm{model})+L(\mathrm{data} \ | \ \mathrm{model})$. For sequential prediction, the MDL method repeatedly selects a model with a minimum objective score of…
arXiv cs.CL TIER_1 English(EN) · Justice Owusu Agyemang, Jerry John Kponyo, Kwame Opuni-Boachie Obour Agyekum, Francisca Adoma Acheampong, Kwame Agyeman-Prempeh Agyekum, James Dzisi Gadze · 2026-06-03 04:00

熵门：在大型语言模型管道中实现近乎无损的令牌压缩的熵猝灭

arXiv:2606.03739v1 Announce Type: new Abstract: LLM pipelines waste substantial token budgets on low-information content: repeated context, verbose responses, and redundant boilerplate. We introduce Entropy Gate, a token compression framework applying entropy quenching $-$ a ther…
arXiv cs.AI TIER_1 English(EN) · Artur Zagitov, Alexander Miasnikov, Maxim Krutikov, Vladimir Aletov, Gleb Molodtsov, Nail Bashirov, Artem Tsedenov, Aleksandr Beznosikov · 2026-06-03 04:00

重新思考张量分解在训练后 LLM 压缩中的作用

arXiv:2606.03465v1 Announce Type: cross Abstract: Post-training compression is essential for deploying large language models (LLMs) under tight resource constraints. Tensor decompositions have emerged as a promising direction, offering compact parameterizations well suited to Tra…
arXiv cs.CL TIER_1 English(EN) · James Dzisi Gadze · 2026-06-02 14:55

熵门：在大型语言模型管道中实现近乎无损的令牌压缩的熵猝灭

LLM pipelines waste substantial token budgets on low-information content: repeated context, verbose responses, and redundant boilerplate. We introduce Entropy Gate, a token compression framework applying entropy quenching $-$ a thermodynamic process that progressively freezes out…
arXiv cs.LG TIER_1 English(EN) · Aleksandr Beznosikov · 2026-06-02 10:45

重新思考张量分解在训练后 LLM 压缩中的作用

Post-training compression is essential for deploying large language models (LLMs) under tight resource constraints. Tensor decompositions have emerged as a promising direction, offering compact parameterizations well suited to Transformer weight structures. However, existing stud…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-02 10:45

重新思考张量分解在训练后 LLM 压缩中的作用

Post-training compression is essential for deploying large language models (LLMs) under tight resource constraints. Tensor decompositions have emerged as a promising direction, offering compact parameterizations well suited to Transformer weight structures. However, existing stud…
arXiv cs.AI TIER_1 English(EN) · Yujia Tong, Yuxi Wang, Yunyang Wan, Tian Zhang, Junhao Dong, Jingling Yuan · 2026-06-02 04:00

压缩能保留不确定性吗？通过一致性预测为量化和稀疏大模型构建统一基准

arXiv:2606.01850v1 Announce Type: new Abstract: Model compression techniques such as quantization and pruning are widely used to reduce the deployment cost of large language models (LLMs), with existing evaluations focusing almost exclusively on accuracy preservation. However, in…
arXiv cs.LG TIER_1 English(EN) · Wneya Yu, Chao Zhang, Li Wang, Samson Lasaulce, Merouane Debbah · 2026-06-02 04:00

ProjQ：面向适配器感知的LLM压缩的项目与量化

arXiv:2606.00494v1 Announce Type: new Abstract: Post-Training Quantization (PTQ) and Low-Rank Adaptation (LoRA) constitute the standard pipeline for efficient Large Language Model (LLM) deployment. However, applying them sequentially poses a problem: PTQ often leaves behind rando…
arXiv cs.AI TIER_1 English(EN) · Elia Cunegatti, Marcus Vukojevic, Erik Nielsen, Giovanni Iacca · 2026-06-02 04:00

从层到子模块：在基于替换的LLM压缩中重新思考粒度

arXiv:2606.02559v1 Announce Type: cross Abstract: Post-training compression of Large Language Models (LLMs) removes entire architectural components, either deleting them or replacing them with fitted modules. Existing replacement-based methods share two design constraints: full-l…
arXiv cs.AI TIER_1 English(EN) · Giovanni Iacca · 2026-06-01 17:52

从层到子模块：在基于替换的LLM压缩中重新思考粒度

Post-training compression of Large Language Models (LLMs) removes entire architectural components, either deleting them or replacing them with fitted modules. Existing replacement-based methods share two design constraints: full-layer granularity and contiguous selection. We argu…
arXiv cs.LG TIER_1 English(EN) · Snigdha Chandan Khilar · 2026-06-01 04:00

面向大语言模型压缩的跨层子空间耦合：统一框架及其经验极限

arXiv:2605.30836v1 Announce Type: new Abstract: Recent SVD based compression methods for large language models like SVD LLM and Basis Sharing can be unified under one optimization problem. While mathematical proofs and tests on Pythia models show this unified approach improves we…
r/LocalLLaMA TIER_1 English(EN) · /u/RudeChocolate9217 · 2026-06-05 02:38

proveKV – LLMs 的诚实 36 倍无损（对比 f32，18 倍对比 fp16）KV 缓存压缩（零 PPL 回归）

<div class="md">I’m sharing a new open‑source repo that demonstrates a reproducible KV‑cache compression technique. - Result: 36× lossless / 68× lossy memory reduction vs. f32‑raw KV cache on SmolLM2‑1.7B + WikiText‑2 (0% ΔPPL). - Transpare…

报道来源 [24]

相关实体

相关话题