PulseAugur
实时 13:28:33
实体 Activation Aware Quantization

Activation Aware Quantization

PulseAugur coverage of Activation Aware Quantization — every cluster mentioning Activation Aware Quantization across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
6
90 天内 6
发布 · 30天
0
90 天内 0
论文 · 30天
4
90 天内 4
层级分布 · 90 天
最近 · 第 1/1 页 · 共 6 条
  1. TOOL · CL_27223 ·

    ExLlamaV3, Unsloth Qwen, and Phi3 agent see major local AI updates

    This week's local AI news highlights significant updates to the ExLlamaV3 inference library, enhancing efficiency for running quantized Llama models on consumer GPUs. Additionally, new GGUF-quantized versions of Qwen 3.…

  2. RESEARCH · CL_23571 ·

    Local AI tools boost LLM speeds with new prediction and decoding techniques

    Recent updates in the local AI community are enhancing inference speeds and providing practical benchmarks for open-weight models. The llama.cpp project now supports Multi-Token Prediction (MTP), which has shown a 40% s…

  3. RESEARCH · CL_15961 ·

    New methods accelerate LLMs via efficient sparsification, quantization, and compression

    Researchers have developed several new methods for compressing and optimizing large language models (LLMs) to improve efficiency and reduce computational costs. SparseForge focuses on efficient semi-structured sparsific…

  4. RESEARCH · CL_14463 ·

    New research explores LLM security, efficiency, and training optimization

    Researchers are developing novel methods to enhance the efficiency and security of Large Language Models (LLMs). One approach, "Widening the Gap," exploits outlier injection to compromise LLM quantization, demonstrating…

  5. RESEARCH · CL_01274 ·

    Hugging Face 推出用于高效 LLM 的先进量化技术

    研究人员正在开发先进的量化技术,以提高大型语言模型 (LLM) 的效率。AutoRound、LATMiX 和 GSQ 等新方法旨在减小模型大小和计算需求,从而能够在功能较弱的硬件上进行部署。这些方法侧重于优化模型权重和激活在较低比特宽度下的表示方式,其中一些方法已达到与更高精度模型相当的准确性。创新包括用于训练后量化的新颖校准策略和用于提高鲁棒性的可学习仿射变换。

  6. RESEARCH · CL_01035 ·

    Optimizing Transformer Inference: Techniques for Faster, Cheaper Large Models

    Large transformer models present significant inference challenges due to their substantial memory footprint and computation costs, which scale quadratically with input length. Researchers and practitioners are exploring…