llama.cpp Releases Enhance Performance and Add New Features

By PulseAugur Editorial · [5 sources] · 2026-06-12 05:17

The llama.cpp project has released several updates, including b9608, which features an update to cpp-httplib and provides pre-compiled binaries for various platforms like macOS, Linux, Android, and Windows. Release b9606 introduces EAGLE3 speculative decoding support, enhancing model inference capabilities. Release b9605 includes OpenCL kernel additions for Adreno GPUs, improving performance on certain mobile devices. Release b9604 addresses CI build and release issues for the SYCL backend, ensuring greater stability. AI

IMPACT These updates to llama.cpp improve the efficiency and accessibility of running large language models on diverse hardware.

RANK_REASON This is a software release for a tool that facilitates running LLMs, not a new frontier model release or significant research paper.

Read on llama.cpp — Releases →

AI-generated summary · Google Gemini · from 5 sources. How we write summaries →

llama.cpp Releases Enhance Performance and Add New Features

COVERAGE [5]

llama.cpp — Releases TIER_1 (SO) · github-actions[bot] · 2026-06-12 10:03

b9608

<details open=""> vendor : update cpp-httplib to 0.47.0 (<a class="issue-link js-issue-link" href="https://github.com/ggml-org/llama.cpp/pull/24395">#24395</a>) Signed-off-by: Adrien Gallouët <a href="mailto:[email protected]">[email protected]</a> </details> <p…
llama.cpp — Releases TIER_1 (SO) · github-actions[bot] · 2026-06-12 08:47

b9606

<details open=""> spec: add EAGLE3 speculative decoding support (<a class="issue-link js-issue-link" href="https://github.com/ggml-org/llama.cpp/pull/18039">#18039</a>) <ul> <li> llama : enable layer input extraction </li> <li> spec: support eagle3 </li> <li>…
llama.cpp — Releases TIER_1 (SO) · github-actions[bot] · 2026-06-12 08:13

b9605

<details open=""> ggml: support concat for scalar types at cuda backend (<a class="issue-link js-issue-link" href="https://github.com/ggml-org/llama.cpp/pull/24011">#24011</a>) <ul> <li> cuda: support concat for scalar types </li> <li> Update concat.cu </li> …
llama.cpp — Releases TIER_1 (SO) · github-actions[bot] · 2026-06-12 07:28

b9604

<details open=""> [SYCL] Fix CI build & release for SYCL backend (<a class="issue-link js-issue-link" href="https://github.com/ggml-org/llama.cpp/pull/24387">#24387</a>) <ul> <li> restore SYCL build and release, remove github cache </li> <li> modify for test …
llama.cpp — Releases TIER_1 (SO) · github-actions[bot] · 2026-06-12 05:17

b9603

<details open=""> opencl: add q5_0/q5_1 gemm and gemv kernels for Adreno (<a class="issue-link js-issue-link" href="https://github.com/ggml-org/llama.cpp/pull/24319">#24319</a>) <ul> <li> opencl: add q5_0 adreno support </li> <li> opencl: add q5_1 adreno support<…

COVERAGE [5]

b9608

b9606

b9605

b9604

b9603

RELATED ENTITIES

RELATED TOPICS