ENTITY GGUF

GGUF

PulseAugur coverage of GGUF — every cluster mentioning GGUF across labs, papers, and developer communities, ranked by signal.

Total · 30d

12

12 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

2

2 over 90d

TIER MIX · 90D

research 2
tool 9
commentary 1

RELATIONSHIPS

SENTIMENT · 30D

2 day(s) with sentiment data

RECENT · PAGE 1/1 · 11 TOTAL

TOOL · CL_29138 · May 12 · 21:33

llama.cpp adds eval tool; MagicQuant v2.0 offers hybrid GGUF quants

The llama.cpp project has introduced llama-eval, a new tool for benchmarking local language models against standard datasets. Concurrently, MagicQuant v2.0 has released advanced hybrid GGUF quantization techniques, inte…
TOOL · CL_27223 · May 11 · 21:34

ExLlamaV3, Unsloth Qwen, and Phi3 agent see major local AI updates

This week's local AI news highlights significant updates to the ExLlamaV3 inference library, enhancing efficiency for running quantized Llama models on consumer GPUs. Additionally, new GGUF-quantized versions of Qwen 3.…
RESEARCH · CL_23571 · May 8 · 21:34

Local AI tools boost LLM speeds with new prediction and decoding techniques

Recent updates in the local AI community are enhancing inference speeds and providing practical benchmarks for open-weight models. The llama.cpp project now supports Multi-Token Prediction (MTP), which has shown a 40% s…
TOOL · CL_21496 · May 7 · 21:35

llama.cpp adds Sparse MoE support, Qwen3.6 GGUF, and WebWorld models for local AI

The llama.cpp project has been updated to support Xiaomi's MiMo-V2.5 Sparse MoE model, allowing local inference of large, parameter-efficient models. Additionally, a new uncensored Qwen3.6 27B model is now available in …
TOOL · CL_16585 · May 5 · 11:16

Ollama platform vulnerable to memory leaks via crafted GGUF files

A critical vulnerability, identified as CVE-2026-5757, has been discovered in the Ollama platform, potentially leading to memory leaks. The flaw is triggered by a specially crafted GGUF file. Security researcher Jeremy …
RESEARCH · CL_15130 · May 4 · 23:49

IBM releases Apache 2.0 licensed Granite 4.1 LLMs in 3B, 8B, 30B sizes

IBM has released its Granite 4.1 family of large language models, available in 3B, 8B, and 30B parameter sizes under an Apache 2.0 license. Unsloth has further provided quantized GGUF variants of the 3B model, offering …
RESEARCH · CL_14127 · May 1 · 05:39

RadLite fine-tunes small LLMs for CPU-deployable radiology AI

Researchers have developed RadLite, a method for fine-tuning small language models (SLMs) with 3-4 billion parameters for radiology tasks. This approach, utilizing LoRA fine-tuning on models like Qwen2.5-3B-Instruct and…
RESEARCH · CL_09151 · Apr 29 · 14:10

SGLang AI inference server hit with critical CVE-2026-5760 vulnerability

A critical security vulnerability (CVE-2026-5760) with a severity score of 9.8 has been identified in SGLang, an AI inference server. The issue arises from a poisoned GGUF model file containing a chat-template that SGLa…
RESEARCH · CL_09107 · Apr 29 · 13:19

Stateful Transformers boost streaming inference; Intel releases AutoRound quantization toolkit

A new paper introduces a stateful transformer inference engine that significantly speeds up processing for streaming data by maintaining a persistent KV cache. This approach allows for query latency that is independent …
RESEARCH · CL_03569 · Apr 25 · 20:52

Quantized Qwen3.6-27B model achieves 100k context on 16GB VRAM

A user on Reddit's r/LocalLLaMA has detailed a method for running the Qwen3.6-27B model on a system with 16GB of VRAM, achieving a context length of 100,000 tokens. The process involves creating a custom GGUF quantizati…
RESEARCH · CL_01070 · Apr 22 · 16:45

Qwen3.6-27B model offers flagship coding performance in a smaller package

Qwen has released Qwen3.6-27B, an open-weight model that reportedly matches flagship-level coding performance. This new model significantly outperforms its predecessor, Qwen3.5-397B-A17B, while being substantially small…