English(EN)JacQuant: STE-Free Quantization-Aware Training via Learned Jacobian Surrogates
新的AI研究聚焦于通过量化和Token剪枝提升模型效率
作者PulseAugur 编辑部·[9 个来源]·
研究人员正在开发新的方法,通过量化和Token剪枝来提高AI模型的效率。一种名为PeRQ的方法,通过在旋转前重新分配激活质量来增强训练后量化,从而显著提高了Llama3 1B等模型的准确性。另一种方法OccamToken,通过使用寄存器锚定的相对证据测试,有效地剪枝视觉语言模型(VLM)中的视觉Token,在保持准确性的同时减少了Token数量。此外,Clark Hash提供了一种无状态编解码器,用于紧凑的神经嵌入存储,以最小的准确性损失将空间需求减少了32倍。JacQuant引入了一个量化感知训练框架,该框架学习雅可比替代物来稳定和加速训练,在超低比特LLM量化方面实现了比传统方法更高的准确性。
AI
arXiv:2601.22347v2 Announce Type: replace-cross Abstract: Recent post-training quantization (PTQ) methods have adopted block rotations to diffuse outliers prior to rounding. While this reduces the overhead of online full-vector rotations, the effect of block structure on outlier …
arXiv cs.AI
TIER_1English(EN)·Geng Li, Guohao Chen, Ting Chen, Shilin Shan, Kuangji Zuo, Bofan Lyu, Tuo An, Gen Li, Jianfei Yang·
arXiv:2605.29657v1 Announce Type: cross Abstract: Vision-language models (VLMs) rely on long visual token sequences for visual understanding, making the prefill stage expensive in both computation and memory. Most existing pruning methods follow an absolute-ranking paradigm, assi…
arXiv cs.AI
TIER_1English(EN)·Stanislav Kirdey, Clark Labs Inc·
arXiv:2605.28034v1 Announce Type: new Abstract: Clark Hash is a small method for storing neural embeddings in less space. It normalizes each database vector, applies a deterministic sparse signed Johnson-Lindenstrauss projection, clips the result, and stores a fixed-width scalar-…
arXiv cs.AI
TIER_1English(EN)·Zhanfeng Feng, Shuai Guo, Xin Di, Long Peng, Yang Cao, Zhengjun Zha·
arXiv:2605.26628v1 Announce Type: new Abstract: This report describes Tail-Aware HiFloat4, our submission to the low-bit text-to-video generation quantization challenge. Our method adapts the public ViDiT-Q post-training quantization pipeline to Wan2.2 under the HiFloat4 numerica…
Clark Hash is a compact, stateless codec that reduces neural embedding storage size by 32x through deterministic sparse projections and scalar quantization while maintaining high similarity accuracy.
This report describes Tail-Aware HiFloat4, our submission to the low-bit text-to-video generation quantization challenge. Our method adapts the public ViDiT-Q post-training quantization pipeline to Wan2.2 under the HiFloat4 numerical format. We quantize the main linear layers in …
arXiv cs.LG
TIER_1English(EN)·Kai Yi, Vignesh Vivekraja, Harshit Khaitan, Steven Li·
arXiv:2605.25469v1 Announce Type: new Abstract: Quantization-aware training (QAT) is widely deployed but typically relies on the Straight-Through Estimator (STE), which passes gradients through non-differentiable quantizers by fiat. This often makes training brittle near bin boun…
Quantization-aware training (QAT) is widely deployed but typically relies on the Straight-Through Estimator (STE), which passes gradients through non-differentiable quantizers by fiat. This often makes training brittle near bin boundaries and weakly aligned with the actual behavi…
<table> <tr><td> <a href="https://www.reddit.com/r/StableDiffusion/comments/1tpcm59/a_wan_22_posttraining_quant_1_model_instead_of/"> <img alt="A Wan 2.2 post-training Quant . 1 model instead of high + low" src="https://preview.redd.it/jzd5r1a8up3h1.png?width=640&crop=smart&a…