llama.cpp project releases multiple updates with broad platform support

By PulseAugur Editorial · [6 sources] · 2026-05-24 02:56

The llama.cpp project has released several updates, including versions b9315, b9313, b9311, b9310, b9305, and b9301. These releases introduce various improvements and bug fixes, such as parallelizing quantization look-up table initialization and fixing checkpoint creation in the server component. The updates also provide pre-compiled binaries for a wide range of operating systems and hardware architectures, including macOS, iOS, Linux, Android, and Windows, with support for different compute backends like Vulkan, ROCm, OpenVINO, SYCL, and CUDA. AI

IMPACT Provides updated tooling for running LLMs on diverse hardware, improving accessibility and performance for developers and users.

RANK_REASON The cluster consists of multiple releases of a software project (llama.cpp) that provides tools for running large language models, rather than a new frontier model release or significant research paper.

Read on llama.cpp — Releases →

AI-generated summary · Google Gemini · from 6 sources. How we write summaries →

llama.cpp project releases multiple updates with broad platform support

COVERAGE [6]

llama.cpp — Releases TIER_1 (SO) · github-actions[bot] · 2026-05-25 18:40

b9315

<details open=""> llama : document that only one on-device state can be saved per sequence (<a class="issue-link js-issue-link" href="https://github.com/ggml-org/llama.cpp/pull/23520">#23520</a>) </details> macOS/iOS: <ul> <li><a href="https://githu…
llama.cpp — Releases TIER_1 (SO) · github-actions[bot] · 2026-05-25 17:14

b9313

<details open=""> ggml : Parallelize quant LUT init (<a class="issue-link js-issue-link" href="https://github.com/ggml-org/llama.cpp/pull/23595">#23595</a>) <ul> <li>Use OpenMP to parallelize iq2xs_init_impl and iq3xs_init_impl.</li> <li>Move the OpenMP detection from ggml…
llama.cpp — Releases TIER_1 (SO) · github-actions[bot] · 2026-05-25 15:57

b9311

<details open=""> vendor : update cpp-httplib to 0.45.1 (<a class="issue-link js-issue-link" href="https://github.com/ggml-org/llama.cpp/pull/23639">#23639</a>) </details> macOS/iOS: <ul> <li><a href="https://github.com/ggml-org/llama.cpp/releases/d…
llama.cpp — Releases TIER_1 (SO) · github-actions[bot] · 2026-05-25 11:52

b9310

<details open=""> server: fix checkpoints creation (<a class="issue-link js-issue-link" href="https://github.com/ggml-org/llama.cpp/pull/22929">#22929</a>) <ul> <li> common : add common_chat_split_by_role </li> <li> cont : fix spans to reach end of message </…
llama.cpp — Releases TIER_1 (SO) · github-actions[bot] · 2026-05-24 11:33

b9305

<details open=""> cmake : fix ui build (<a class="issue-link js-issue-link" href="https://github.com/ggml-org/llama.cpp/pull/23592">#23592</a>) <ul> <li> cmake/ui : add -fPIC to llama-ui static lib </li> <li> cmake : rename host compiled embed helper </li> </…
llama.cpp — Releases TIER_1 (SO) · njsyw1997 · 2026-05-24 02:56

b9301

hexagon: apply repl optimization in flash attn softmax as <a class="issue-link js-issue-link" href="https://github.com/ggml-org/llama.cpp/pull/22993">#22993</a> (<a class="issue-link js-issue-link" href="https://github.com/ggml-org/llama.cpp/issues/23">#23</a>…

COVERAGE [6]

b9315

b9313

b9311

b9310

b9305

b9301

RELATED ENTITIES

RELATED TOPICS