PulseAugur
LIVE 06:59:43
ENTITY GGUF

GGUF

PulseAugur coverage of GGUF — every cluster mentioning GGUF across labs, papers, and developer communities, ranked by signal.

Total · 30d
12
12 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
2
2 over 90d
TIER MIX · 90D
RELATIONSHIPS
SENTIMENT · 30D

2 day(s) with sentiment data

RECENT · PAGE 1/1 · 11 TOTAL
  1. TOOL · CL_29138 ·

    llama.cpp adds eval tool; MagicQuant v2.0 offers hybrid GGUF quants

    The llama.cpp project has introduced llama-eval, a new tool for benchmarking local language models against standard datasets. Concurrently, MagicQuant v2.0 has released advanced hybrid GGUF quantization techniques, inte…

  2. TOOL · CL_27223 ·

    ExLlamaV3, Unsloth Qwen, and Phi3 agent see major local AI updates

    This week's local AI news highlights significant updates to the ExLlamaV3 inference library, enhancing efficiency for running quantized Llama models on consumer GPUs. Additionally, new GGUF-quantized versions of Qwen 3.…

  3. RESEARCH · CL_23571 ·

    Local AI tools boost LLM speeds with new prediction and decoding techniques

    Recent updates in the local AI community are enhancing inference speeds and providing practical benchmarks for open-weight models. The llama.cpp project now supports Multi-Token Prediction (MTP), which has shown a 40% s…

  4. TOOL · CL_21496 ·

    llama.cpp adds Sparse MoE support, Qwen3.6 GGUF, and WebWorld models for local AI

    The llama.cpp project has been updated to support Xiaomi's MiMo-V2.5 Sparse MoE model, allowing local inference of large, parameter-efficient models. Additionally, a new uncensored Qwen3.6 27B model is now available in …

  5. TOOL · CL_16585 ·

    Ollama platform vulnerable to memory leaks via crafted GGUF files

    A critical vulnerability, identified as CVE-2026-5757, has been discovered in the Ollama platform, potentially leading to memory leaks. The flaw is triggered by a specially crafted GGUF file. Security researcher Jeremy …

  6. RESEARCH · CL_15130 ·

    IBM releases Apache 2.0 licensed Granite 4.1 LLMs in 3B, 8B, 30B sizes

    IBM has released its Granite 4.1 family of large language models, available in 3B, 8B, and 30B parameter sizes under an Apache 2.0 license. Unsloth has further provided quantized GGUF variants of the 3B model, offering …

  7. RESEARCH · CL_14127 ·

    RadLite fine-tunes small LLMs for CPU-deployable radiology AI

    Researchers have developed RadLite, a method for fine-tuning small language models (SLMs) with 3-4 billion parameters for radiology tasks. This approach, utilizing LoRA fine-tuning on models like Qwen2.5-3B-Instruct and…

  8. RESEARCH · CL_09151 ·

    SGLang AI inference server hit with critical CVE-2026-5760 vulnerability

    A critical security vulnerability (CVE-2026-5760) with a severity score of 9.8 has been identified in SGLang, an AI inference server. The issue arises from a poisoned GGUF model file containing a chat-template that SGLa…

  9. RESEARCH · CL_09107 ·

    Stateful Transformers boost streaming inference; Intel releases AutoRound quantization toolkit

    A new paper introduces a stateful transformer inference engine that significantly speeds up processing for streaming data by maintaining a persistent KV cache. This approach allows for query latency that is independent …

  10. RESEARCH · CL_03569 ·

    Quantized Qwen3.6-27B model achieves 100k context on 16GB VRAM

    A user on Reddit's r/LocalLLaMA has detailed a method for running the Qwen3.6-27B model on a system with 16GB of VRAM, achieving a context length of 100,000 tokens. The process involves creating a custom GGUF quantizati…

  11. RESEARCH · CL_01070 ·

    Qwen3.6-27B model offers flagship coding performance in a smaller package

    Qwen has released Qwen3.6-27B, an open-weight model that reportedly matches flagship-level coding performance. This new model significantly outperforms its predecessor, Qwen3.5-397B-A17B, while being substantially small…