PulseAugur
实时 19:25:19
实体 llama.cpp

llama.cpp

PulseAugur coverage of llama.cpp — every cluster mentioning llama.cpp across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
98
90 天内 98
发布 · 30天
0
90 天内 0
论文 · 30天
7
90 天内 7
层级分布 · 90 天
关系
时间线
  1. 2026-05-25 research_milestone A fix is expected for llama.cpp to address split mode tensor crashes. 来源
  2. 2026-05-25 product_launch A pull request was submitted to improve checkpoint creation and context handling in llama.cpp. 来源
  3. 2026-05-24 product_launch llama.cpp released version b9305 with pre-compiled binaries for multiple platforms. 来源
  4. 2026-05-17 research_milestone llama.cpp implements MTP optimizations and prompt decode improvements for faster local AI inference. 来源
  5. 2026-05-14 product_launch A performance-optimized fork of llama.cpp was released with new features. 来源
  6. 2026-05-12 product_launch llama.cpp project integrates llama-eval tool for model benchmarking. 来源
情绪 · 30 天

19 天有情绪数据

最近 · 第 1/5 页 · 共 98 条
  1. TOOL · CL_49945 ·

    llama.cpp adds CUDA FWHT for faster KV cache quantization

    A pull request to the llama.cpp project introduces a CUDA implementation of the Fast Walsh-Hadamard Transform (FWHT). This optimization, developed by user am17an, aims to speed up operations when quantizing the key-valu…

  2. TOOL · CL_49850 ·

    Llama.cpp split mode tensor fix to resolve multi-GPU crashes

    A fix is reportedly incoming for the llama.cpp project to address crashes related to split mode tensor operations. This issue has been causing instability, particularly for users employing multiple GPUs, with tests show…

  3. MEME · CL_49852 ·

    RTX 3060 users seek best coding LLM and setup

    A user on the r/LocalLLaMA subreddit is seeking recommendations for the best coding-focused large language model that can run on hardware with 12GB of VRAM, specifically an RTX 3060. The user is also inquiring about opt…

  4. TOOL · CL_49718 ·

    Developer runs Anthropic Code locally for free using Qwen model

    A developer successfully ran Anthropic's Claude Code locally for four hours, processing 7 million tokens without incurring API costs. This was achieved by routing Claude Code's requests through LiteLLM to a local Qwen3.…

  5. TOOL · CL_49509 ·

    Old Mac Pro repurposed for local LLM tasks with new drivers

    An old Mac Pro, originally costing nearly £10,000, is being repurposed for local LLM work thanks to new Linux drivers that enable its D700 GPUs. The machine, equipped with 64GB of RAM and 24 cores, can now run models vi…

  6. MEME · CL_49510 ·

    llama.cpp users report persistent out-of-memory errors

    A user on Reddit's r/LocalLLaMA subreddit is experiencing a persistent out-of-memory (OOM) issue with the llama.cpp software. The problem causes the process to consume increasing amounts of system RAM over 20-40 minutes…

  7. TOOL · CL_48548 ·

    llama.cpp update targets faster agentic coding by optimizing context handling

    A pull request for the llama.cpp project aims to improve the responsiveness of agentic coding workflows. The proposed changes address issues where context rewriting by tools or models could force full prompt reprocessin…

  8. TOOL · CL_48575 ·

    llama.cpp releases b9309 with integer overflow fixes

    The llama.cpp project has released version b9309, which includes fixes for integer overflow issues. This release is part of ongoing development and maintenance for the C/C++ implementation of Llama models.

  9. MEME · CL_48207 ·

    LLaMA user sees doubled inference speed with Qwen model after CPU parameter change

    A user on Reddit's r/LocalLLaMA subreddit is seeking assistance understanding unexpected performance gains when running the Qwen3.6-35B-A3B-UD-Q4_K_XL model. They observed a doubling of inference speed, from 17 to 34 to…

  10. TOOL · CL_48199 ·

    hipEngine offers faster Qwen 3.6 LLM inference on AMD RDNA3 GPUs

    A new open-source inference engine called hipEngine has been developed for AMD's RDNA3 GPUs, enabling faster native inference of the Qwen 3.6 large language model. The engine, written in Python with a HIP/C++ core, util…

  11. TOOL · CL_47461 ·

    llama.cpp adds native tools, Qwen releases 35B GGUF model

    The llama.cpp project has integrated native tools, including shell command execution and file editing, directly into its server, enabling local large language models to perform actions and automate tasks. This advanceme…

  12. COMMENTARY · CL_48414 ·

    LocalLLaMA user seeks harness for multi-agent Qwen 3.6 setup

    A user on Reddit's r/LocalLLaMA subreddit is seeking recommendations for an open-source harness to manage multiple local AI agents. They are currently using Qwen 3.5/3.6 27B models on a Windows 10 machine with an RTX 30…

  13. MEME · CL_48210 ·

    LocalLLaMA user seeks VRAM optimization for smaller models

    A user on the r/LocalLLaMA subreddit is seeking assistance with optimizing their GPU VRAM usage for running smaller language models. Despite successfully running larger models like Gemma4 26B and Qwen 3.6 35B MoEs, they…

  14. TOOL · CL_47069 ·

    Developer runs LLMs on $50 AMD RX 580 GPU using Vulkan

    A developer demonstrated running large language models and image generation software on an older AMD RX 580 GPU with 8GB of VRAM, a feat previously thought impossible for such hardware. By leveraging the Vulkan backend …

  15. TOOL · CL_48218 ·

    Llama.cpp server enables RAG and shell commands via multi-sandbox setup

    A user on Reddit's r/LocalLLaMA shared a detailed method for enabling Retrieval Augmented Generation (RAG) and other command-line functionalities within the llama.cpp server's web UI. This approach involves enabling nat…

  16. TOOL · CL_46825 ·

    User migrates AI browser app cluster from LM Studio to llama.cpp

    A user is migrating their AI browser application cluster from LM Studio to llama.cpp. This move is motivated by a desire to avoid being tied to a single company's offerings. The application is intended for chatting with…

  17. TOOL · CL_47640 ·

    llama.cpp releases b9305 with broad platform support

    The llama.cpp project has released version b9305, introducing a CMake fix for UI builds and providing pre-compiled binaries for a wide range of platforms. These include macOS, iOS, various Linux distributions (CPU, Vulk…

  18. TOOL · CL_46390 ·

    Qwen 3.6 models show speed gains with MTP, but context window shrinks

    A technical analysis explores the performance of Qwen 3.6's 27B and 35B models when using Multi-Token Prediction (MTP), a speculative decoding technique. The tests, conducted on a 16GB VRAM GPU, reveal that MTP can sign…

  19. TOOL · CL_48213 ·

    llama.cpp server adds native tools for agent-like functionality

    The llama.cpp server now includes experimental native support for a suite of tools, enabling it to function as a basic agent harness. These tools, including file operations and shell command execution, can be enabled vi…

  20. TOOL · CL_45782 ·

    Redditor uses 768GB of used Optane RAM to run 1T-parameter LLM locally

    A Redditor has successfully run a 1-trillion-parameter LLM, specifically Kimi K2.5, locally on a single GPU workstation by utilizing 768GB of second-hand Intel Optane Persistent Memory modules as RAM. This setup achieve…