PulseAugur
EN
LIVE 17:35:23
ENTITY ik_llama.cpp

ik_llama.cpp

PulseAugur coverage of ik_llama.cpp — every cluster mentioning ik_llama.cpp across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
6
6 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
0
0 over 90d
TIER MIX · 90D
TOPICS
SENTIMENT · 30D

2 day(s) with sentiment data

RECENT · PAGE 1/1 · 6 TOTAL
  1. TOOL · CL_104044 ·

    ik_llama.cpp adds support for Laguna M.1 GGUF model

    A pull request has been submitted to the ik_llama.cpp repository to add support for the Laguna M.1 GGUF model. This update, identified as pull request #2003, aims to integrate the new model format into the existing code…

  2. TOOL · CL_103504 ·

    New --numa mirror mode boosts CPU inference performance

    A developer has forked the ik_llama.cpp project to introduce a new "--numa mirror" mode designed to enhance performance on multi-socket CPU systems. This mode addresses the significant performance penalty incurred when …

  3. TOOL · CL_103503 ·

    New Qwen3.6-27B quantizations optimize for 16GB VRAM, while multi-GPU setups show strong performance

    The Qwen3.6-27B model has seen new experimental quantizations released for local LLM inference, focusing on optimizing performance for NVIDIA GPUs with 16GB of VRAM. One quantization, IQ4_KS, is tweaked to improve logic…

  4. TOOL · CL_55274 ·

    Qwen 3.5 35B model runs at 10.33 t/s on $300 laptop

    A user on Reddit's r/LocalLLaMA subreddit has detailed their experience running the Qwen 3.5 35B model on a budget laptop. They achieved an inference speed of 10.33 tokens per second on a $300 Lenovo Ideapad Slim 3i wit…

  5. TOOL · CL_43106 ·

    Qwen 3.6 model hits 110 tokens/sec on consumer GPUs via llama.cpp

    The open-weight model Qwen 3.6, in its 35 billion parameter version, has achieved an impressive 110 tokens per second inference speed on consumer GPUs with 12GB of VRAM. This performance was enabled by a specialized var…

  6. RESEARCH · CL_03577 ·

    llama.cpp and ik_llama.cpp add FP4 inference support for VRAM savings

    The llama.cpp and ik_llama.cpp projects have both integrated support for FP4 (4-bit floating-point) inference, a significant advancement for model quantization. llama.cpp now includes NVFP4, an Nvidia-specific format, w…