GPTQ

ENTITY GPTQ

GPTQ

PulseAugur coverage of GPTQ — every cluster mentioning GPTQ across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

16

16 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

10

10 over 90d

TIER MIX · 90D

TOPICS

RELATIONSHIPS

SENTIMENT · 30D

6 day(s) with sentiment data

RECENT · PAGE 1/1 · 16 TOTAL

TOOL · CL_113441 · Jun 27 · 10:53

Developer implements GPTQ quantization from scratch, achieving minimal performance loss

A developer detailed their process of implementing the GPTQ quantization method from scratch on a nanoGPT model. This technique reduces model size and speeds up inference by lowering the precision of weights, but unlike…
TOOL · CL_100041 · Jun 19 · 06:39

Quantization causes 7-point task accuracy drop, bypassing perplexity

A company called Nexus Labs discovered that quantizing a fine-tuned 14B agent model to INT4 using GPTQ resulted in a significant 7-point drop in multi-step task completion accuracy, despite perplexity metrics showing on…
TOOL · CL_98076 · Jun 18 · 04:00

New HeRo-Q framework enhances stable low-bit quantization for LLMs

Researchers have developed a new framework called HeRo-Q to improve the stability of low-bit quantization in large language models. This method addresses the 'low error, high loss' phenomenon by reshaping the loss lands…
TOOL · CL_84316 · Jun 11 · 01:13

LLM Quantization Formats: GGUF, GPTQ, AWQ, and NF4 Compared

The article compares four major LLM weight quantization formats: GGUF, GPTQ, AWQ, and NF4. Quantization is crucial for reducing model size to fit within limited hardware constraints, such as consumer GPUs or unified mem…
TOOL · CL_80007 · Jun 9 · 04:00

New paper details optimized quantization for LLMs

Researchers have published a paper detailing advancements in quantized matrix multiplication, specifically for large language models. The work, a follow-up to previous research, focuses on scenarios where the covariance…
RESEARCH · CL_66006 · Jun 2 · 04:00

New quantization methods improve AI model compression and spectral properties

Researchers have developed new methods for model quantization, a technique used to compress AI models. One approach, YAQA, introduces theoretical results for end-to-end error bounds in quantization, outperforming existi…
TOOL · CL_53214 · May 26 · 21:34

Ollama v0.30.0, Qwen3.5 35B, and 1-bit AI on WebGPU

Ollama's v0.30.0 pre-release is set to improve llama.cpp interoperability. Separately, a new Qwen3.5 35B model is available in GGUF and GPTQ formats, optimized for local inference on consumer GPUs. Additionally, PrismML…
RESEARCH · CL_48868 · May 21 · 22:23

New methods enhance LLM quantization for efficiency and accuracy

Researchers have developed several new methods to improve the efficiency and accuracy of quantizing large language models (LLMs). These techniques aim to reduce the memory footprint and computational cost of LLMs, makin…
RESEARCH · CL_35775 · May 17 · 18:19

llmcompressor tool enables LLM compression via FP8, GPTQ, SmoothQuant

A new open-source tool named llmcompressor allows developers to compress and benchmark instruction-tuned large language models. The tool demonstrates how to apply post-training quantization techniques such as FP8, GPTQ,…
TOOL · CL_30718 · May 13 · 16:47

New paper details improved quantization for LLM matrix multiplication

Researchers have published a paper detailing advancements in quantized matrix multiplication, specifically for large language models (LLMs). This second part of their work focuses on scenarios where the covariance matri…
TOOL · CL_27223 · May 11 · 21:34

ExLlamaV3, Unsloth Qwen, and Phi3 agent see major local AI updates

This week's local AI news highlights significant updates to the ExLlamaV3 inference library, enhancing efficiency for running quantized Llama models on consumer GPUs. Additionally, new GGUF-quantized versions of Qwen 3.…
RESEARCH · CL_15961 · May 5 · 04:00

New methods accelerate LLMs via efficient sparsification, quantization, and compression

Researchers have developed several new methods for compressing and optimizing large language models (LLMs) to improve efficiency and reduce computational costs. SparseForge focuses on efficient semi-structured sparsific…
RESEARCH · CL_11807 · Apr 30 · 18:55

New methods tackle LLM quantization for improved efficiency and accuracy

Researchers have developed several new methods to improve the efficiency of large language models (LLMs) through quantization. OSAQ focuses on suppressing weight outliers using a low-rank Hessian property for accurate l…
RESEARCH · CL_14463 · Apr 27 · 04:00

New research explores LLM security, efficiency, and training optimization

Researchers are developing novel methods to enhance the efficiency and security of Large Language Models (LLMs). One approach, "Widening the Gap," exploits outlier injection to compromise LLM quantization, demonstrating…
RESEARCH · CL_01274 · May 24 · 00:00

Hugging Face introduces advanced quantization techniques for efficient LLMs

Researchers are developing advanced quantization techniques to make large language models (LLMs) more efficient. New methods like AutoRound, LATMiX, and GSQ aim to reduce model size and computational requirements, enabl…
RESEARCH · CL_01035 · Jan 18 · 00:00

Optimizing Transformer Inference: Techniques for Faster, Cheaper Large Models

Large transformer models present significant inference challenges due to their substantial memory footprint and computation costs, which scale quadratically with input length. Researchers and practitioners are exploring…