ENTITY ik_llama.cpp

ik_llama.cpp

PulseAugur coverage of ik_llama.cpp — every cluster mentioning ik_llama.cpp across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

6 over 90d

Releases · 30d

0 over 90d

Papers · 30d

0 over 90d

TIER MIX · 90D

TOPICS

SENTIMENT · 30D

2 day(s) with sentiment data

RECENT · PAGE 1/1 · 6 TOTAL

TOOL · CL_104044 · Jun 22 · 17:48

ik_llama.cpp adds support for Laguna M.1 GGUF model

A pull request has been submitted to the ik_llama.cpp repository to add support for the Laguna M.1 GGUF model. This update, identified as pull request #2003, aims to integrate the new model format into the existing code…
TOOL · CL_103504 · Jun 21 · 17:37

New --numa mirror mode boosts CPU inference performance

A developer has forked the ik_llama.cpp project to introduce a new "--numa mirror" mode designed to enhance performance on multi-socket CPU systems. This mode addresses the significant performance penalty incurred when …
TOOL · CL_103503 · Jun 21 · 14:35

New Qwen3.6-27B quantizations optimize for 16GB VRAM, while multi-GPU setups show strong performance

The Qwen3.6-27B model has seen new experimental quantizations released for local LLM inference, focusing on optimizing performance for NVIDIA GPUs with 16GB of VRAM. One quantization, IQ4_KS, is tweaked to improve logic…
TOOL · CL_55274 · May 27 · 19:26

Qwen 3.5 35B model runs at 10.33 t/s on $300 laptop

A user on Reddit's r/LocalLLaMA subreddit has detailed their experience running the Qwen 3.5 35B model on a budget laptop. They achieved an inference speed of 10.33 tokens per second on a $300 Lenovo Ideapad Slim 3i wit…
TOOL · CL_43106 · May 21 · 21:33

Qwen 3.6 model hits 110 tokens/sec on consumer GPUs via llama.cpp

The open-weight model Qwen 3.6, in its 35 billion parameter version, has achieved an impressive 110 tokens per second inference speed on consumer GPUs with 12GB of VRAM. This performance was enabled by a specialized var…
RESEARCH · CL_03577 · Apr 25 · 15:42

llama.cpp and ik_llama.cpp add FP4 inference support for VRAM savings

The llama.cpp and ik_llama.cpp projects have both integrated support for FP4 (4-bit floating-point) inference, a significant advancement for model quantization. llama.cpp now includes NVFP4, an Nvidia-specific format, w…

ik_llama.cpp adds support for Laguna M.1 GGUF model

New --numa mirror mode boosts CPU inference performance

New Qwen3.6-27B quantizations optimize for 16GB VRAM, while multi-GPU setups show strong performance

Qwen 3.5 35B model runs at 10.33 t/s on $300 laptop

Qwen 3.6 model hits 110 tokens/sec on consumer GPUs via llama.cpp

llama.cpp and ik_llama.cpp add FP4 inference support for VRAM savings