English(EN) JacQuant: STE-Free Quantization-Aware Training via Learned Jacobian Surrogates

新的AI研究聚焦于通过量化和Token剪枝提升模型效率

作者 PulseAugur 编辑部 · [9 个来源] · 2026-05-25 06:19

研究人员正在开发新的方法，通过量化和Token剪枝来提高AI模型的效率。一种名为PeRQ的方法，通过在旋转前重新分配激活质量来增强训练后量化，从而显著提高了Llama3 1B等模型的准确性。另一种方法OccamToken，通过使用寄存器锚定的相对证据测试，有效地剪枝视觉语言模型（VLM）中的视觉Token，在保持准确性的同时减少了Token数量。此外，Clark Hash提供了一种无状态编解码器，用于紧凑的神经嵌入存储，以最小的准确性损失将空间需求减少了32倍。JacQuant引入了一个量化感知训练框架，该框架学习雅可比替代物来稳定和加速训练，在超低比特LLM量化方面实现了比传统方法更高的准确性。 AI

影响这些在量化和Token剪枝方面的进展有望带来更高效的AI模型，从而实现更广泛的部署并降低计算成本。

排序理由该集群包含多篇arXiv论文，详细介绍了AI模型优化技术方面的新研究。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 9 个来源。我们如何撰写摘要 →

报道来源 [9]

arXiv cs.AI TIER_1 English(EN) · Sai Sanjeet, Ian Colbert, Pablo Monteagudo-Lago, Giuseppe Franco, Yaman Umuroglu, Nicholas J. Fraser · 2026-05-29 04:00

后训练量化中块旋转的极限探索

arXiv:2601.22347v2 Announce Type: replace-cross Abstract: Recent post-training quantization (PTQ) methods have adopted block rotations to diffuse outliers prior to rounding. While this reduces the overhead of online full-vector rotations, the effect of block structure on outlier …
arXiv cs.AI TIER_1 English(EN) · Geng Li, Guohao Chen, Ting Chen, Shilin Shan, Kuangji Zuo, Bofan Lyu, Tuo An, Gen Li, Jianfei Yang · 2026-05-29 04:00

OccamToken：无需训练、预算自适应的 VLM 推理高效令牌剪枝

arXiv:2605.29657v1 Announce Type: cross Abstract: Vision-language models (VLMs) rely on long visual token sequences for visual understanding, making the prefill stage expensive in both computation and memory. Most existing pruning methods follow an absolute-ranking paradigm, assi…
arXiv cs.AI TIER_1 English(EN) · Stanislav Kirdey, Clark Labs Inc · 2026-05-28 04:00

Clark Hash：神经嵌入的无状态稀疏 Johnson-Lindenstrauss 量化

arXiv:2605.28034v1 Announce Type: new Abstract: Clark Hash is a small method for storing neural embeddings in less space. It normalizes each database vector, applies a deterministic sparse signed Johnson-Lindenstrauss projection, clips the result, and stores a fixed-width scalar-…
arXiv cs.AI TIER_1 English(EN) · Zhanfeng Feng, Shuai Guo, Xin Di, Long Peng, Yang Cao, Zhengjun Zha · 2026-05-27 04:00

Tail-Aware HiFloat4: Wan2.2 的 W4A4 训练后量化

arXiv:2605.26628v1 Announce Type: new Abstract: This report describes Tail-Aware HiFloat4, our submission to the low-bit text-to-video generation quantization challenge. Our method adapts the public ViDiT-Q post-training quantization pipeline to Wan2.2 under the HiFloat4 numerica…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-27 00:00

Clark Hash：无状态稀疏 Johnson-Lindenstrauss 量化用于神经嵌入

Clark Hash is a compact, stateless codec that reduces neural embedding storage size by 32x through deterministic sparse projections and scalar quantization while maintaining high similarity accuracy.
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-26 07:04

Tail-Aware HiFloat4: Wan2.2 的 W4A4 训练后量化

This report describes Tail-Aware HiFloat4, our submission to the low-bit text-to-video generation quantization challenge. Our method adapts the public ViDiT-Q post-training quantization pipeline to Wan2.2 under the HiFloat4 numerical format. We quantize the main linear layers in …
arXiv cs.LG TIER_1 English(EN) · Kai Yi, Vignesh Vivekraja, Harshit Khaitan, Steven Li · 2026-05-26 04:00

JacQuant：通过学习雅可比代理实现无 STE 的量化感知训练

arXiv:2605.25469v1 Announce Type: new Abstract: Quantization-aware training (QAT) is widely deployed but typically relies on the Straight-Through Estimator (STE), which passes gradients through non-differentiable quantizers by fiat. This often makes training brittle near bin boun…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-25 06:19

JacQuant：通过学习雅可比替代实现无STE的量化感知训练

Quantization-aware training (QAT) is widely deployed but typically relies on the Straight-Through Estimator (STE), which passes gradients through non-differentiable quantizers by fiat. This often makes training brittle near bin boundaries and weakly aligned with the actual behavi…
r/StableDiffusion TIER_2 English(EN) · /u/AgeNo5351 · 2026-05-27 17:34

Wan 2.2 训练后量化模型，而非高低精度

<table> <tr><td> <a href="https://www.reddit.com/r/StableDiffusion/comments/1tpcm59/a_wan_22_posttraining_quant_1_model_instead_of/"> <img alt="A Wan 2.2 post-training Quant . 1 model instead of high + low" src="https://preview.redd.it/jzd5r1a8up3h1.png?width=640&crop=smart&a…

报道来源 [9]

相关实体

相关话题