PulseAugur
实时 09:18:13
English(EN) JacQuant: STE-Free Quantization-Aware Training via Learned Jacobian Surrogates

新的AI研究聚焦于通过量化和Token剪枝提升模型效率

研究人员正在开发新的方法,通过量化和Token剪枝来提高AI模型的效率。一种名为PeRQ的方法,通过在旋转前重新分配激活质量来增强训练后量化,从而显著提高了Llama3 1B等模型的准确性。另一种方法OccamToken,通过使用寄存器锚定的相对证据测试,有效地剪枝视觉语言模型(VLM)中的视觉Token,在保持准确性的同时减少了Token数量。此外,Clark Hash提供了一种无状态编解码器,用于紧凑的神经嵌入存储,以最小的准确性损失将空间需求减少了32倍。JacQuant引入了一个量化感知训练框架,该框架学习雅可比替代物来稳定和加速训练,在超低比特LLM量化方面实现了比传统方法更高的准确性。 AI

影响 这些在量化和Token剪枝方面的进展有望带来更高效的AI模型,从而实现更广泛的部署并降低计算成本。

排序理由 该集群包含多篇arXiv论文,详细介绍了AI模型优化技术方面的新研究。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 9 个来源。 我们如何撰写摘要 →

新的AI研究聚焦于通过量化和Token剪枝提升模型效率

报道来源 [9]

  1. arXiv cs.AI TIER_1 English(EN) · Sai Sanjeet, Ian Colbert, Pablo Monteagudo-Lago, Giuseppe Franco, Yaman Umuroglu, Nicholas J. Fraser ·

    后训练量化中块旋转的极限探索

    arXiv:2601.22347v2 Announce Type: replace-cross Abstract: Recent post-training quantization (PTQ) methods have adopted block rotations to diffuse outliers prior to rounding. While this reduces the overhead of online full-vector rotations, the effect of block structure on outlier …

  2. arXiv cs.AI TIER_1 English(EN) · Geng Li, Guohao Chen, Ting Chen, Shilin Shan, Kuangji Zuo, Bofan Lyu, Tuo An, Gen Li, Jianfei Yang ·

    OccamToken:无需训练、预算自适应的 VLM 推理高效令牌剪枝

    arXiv:2605.29657v1 Announce Type: cross Abstract: Vision-language models (VLMs) rely on long visual token sequences for visual understanding, making the prefill stage expensive in both computation and memory. Most existing pruning methods follow an absolute-ranking paradigm, assi…

  3. arXiv cs.AI TIER_1 English(EN) · Stanislav Kirdey, Clark Labs Inc ·

    Clark Hash:神经嵌入的无状态稀疏 Johnson-Lindenstrauss 量化

    arXiv:2605.28034v1 Announce Type: new Abstract: Clark Hash is a small method for storing neural embeddings in less space. It normalizes each database vector, applies a deterministic sparse signed Johnson-Lindenstrauss projection, clips the result, and stores a fixed-width scalar-…

  4. arXiv cs.AI TIER_1 English(EN) · Zhanfeng Feng, Shuai Guo, Xin Di, Long Peng, Yang Cao, Zhengjun Zha ·

    Tail-Aware HiFloat4: Wan2.2 的 W4A4 训练后量化

    arXiv:2605.26628v1 Announce Type: new Abstract: This report describes Tail-Aware HiFloat4, our submission to the low-bit text-to-video generation quantization challenge. Our method adapts the public ViDiT-Q post-training quantization pipeline to Wan2.2 under the HiFloat4 numerica…

  5. Hugging Face Daily Papers TIER_1 English(EN) ·

    Clark Hash:无状态稀疏 Johnson-Lindenstrauss 量化用于神经嵌入

    Clark Hash is a compact, stateless codec that reduces neural embedding storage size by 32x through deterministic sparse projections and scalar quantization while maintaining high similarity accuracy.

  6. Hugging Face Daily Papers TIER_1 English(EN) ·

    Tail-Aware HiFloat4: Wan2.2 的 W4A4 训练后量化

    This report describes Tail-Aware HiFloat4, our submission to the low-bit text-to-video generation quantization challenge. Our method adapts the public ViDiT-Q post-training quantization pipeline to Wan2.2 under the HiFloat4 numerical format. We quantize the main linear layers in …

  7. arXiv cs.LG TIER_1 English(EN) · Kai Yi, Vignesh Vivekraja, Harshit Khaitan, Steven Li ·

    JacQuant:通过学习雅可比代理实现无 STE 的量化感知训练

    arXiv:2605.25469v1 Announce Type: new Abstract: Quantization-aware training (QAT) is widely deployed but typically relies on the Straight-Through Estimator (STE), which passes gradients through non-differentiable quantizers by fiat. This often makes training brittle near bin boun…

  8. Hugging Face Daily Papers TIER_1 English(EN) ·

    JacQuant:通过学习雅可比替代实现无STE的量化感知训练

    Quantization-aware training (QAT) is widely deployed but typically relies on the Straight-Through Estimator (STE), which passes gradients through non-differentiable quantizers by fiat. This often makes training brittle near bin boundaries and weakly aligned with the actual behavi…

  9. r/StableDiffusion TIER_2 English(EN) · /u/AgeNo5351 ·

    Wan 2.2 训练后量化模型,而非高低精度

    <table> <tr><td> <a href="https://www.reddit.com/r/StableDiffusion/comments/1tpcm59/a_wan_22_posttraining_quant_1_model_instead_of/"> <img alt="A Wan 2.2 post-training Quant . 1 model instead of high + low" src="https://preview.redd.it/jzd5r1a8up3h1.png?width=640&amp;crop=smart&a…