llama.cpp releases add new tensor support and bug fixes

By PulseAugur Editorial · [8 sources] · 2026-05-22 21:38

The llama.cpp project has released several updates, including version b9297 which adds NVFP4 MTP scale tensors and links Qwen3.5 MTP tensors. Previous releases, such as b9296 and b9295, focused on bug fixes and improvements for Vulkan and other functionalities. These releases provide pre-compiled binaries for a wide range of operating systems and hardware architectures, including macOS, Linux, Android, and Windows, with support for various compute backends like CUDA, ROCm, Vulkan, and SYCL. AI

IMPACT Ongoing development of llama.cpp provides users with more efficient and compatible tools for running LLMs on diverse hardware.

RANK_REASON The cluster contains multiple releases of an open-source project that provides tools for running large language models, indicating ongoing development and updates.

Read on llama.cpp — Releases →

llama.cpp

AI-generated summary · Google Gemini · from 8 sources. How we write summaries →

llama.cpp releases add new tensor support and bug fixes

COVERAGE [8]

llama.cpp — Releases TIER_1 (SO) · github-actions[bot] · 2026-05-23 17:17

b9297

<details open=""> model : add NVFP4 MTP scale tensors (<a class="issue-link js-issue-link" href="https://github.com/ggml-org/llama.cpp/pull/23563">#23563</a>) <ul> <li> Add NVFP4 MTP scale tensors </li> <li> Link Qwen3.5 MTP tensors </li> <li> Aligned null…
llama.cpp — Releases TIER_1 (SO) · github-actions[bot] · 2026-05-23 13:01

b9296

<details open=""> ggml : Check the right iface method before using the fallback 2d get (<a class="issue-link js-issue-link" href="https://github.com/ggml-org/llama.cpp/pull/23514">#23514</a>) </details> macOS/iOS: <ul> <li><a href="https://github.co…
llama.cpp — Releases TIER_1 (SO) · github-actions[bot] · 2026-05-23 09:57

b9295

<details open=""> vulkan: fix windows find_package of SPIRV-Headers (<a class="issue-link js-issue-link" href="https://github.com/ggml-org/llama.cpp/pull/23215">#23215</a>) <ul> <li> vulkan: fix windows find_package of SPIRV-Headers </li> <li> not windows-only</p…
llama.cpp — Releases TIER_1 (SO) · github-actions[bot] · 2026-05-23 01:51

b9294

<details open=""> opencl: generalize Adreno MoE kernels on M (<a class="issue-link js-issue-link" href="https://github.com/ggml-org/llama.cpp/pull/23449">#23449</a>) </details> macOS/iOS: <ul> <li><a href="https://github.com/ggml-org/llama.cpp/relea…
llama.cpp — Releases TIER_1 (SO) · github-actions[bot] · 2026-05-22 22:19

b9291

<details open=""> SYCL: improve MoE prefill throughput (<a class="issue-link js-issue-link" href="https://github.com/ggml-org/llama.cpp/pull/23142">#23142</a>) <ul> <li>change <code>k_copy_src1_to_contiguous</code> so that uses a precomputed contiguous mapping where all ro…
llama.cpp — Releases TIER_1 (SO) · github-actions[bot] · 2026-05-22 22:19

b9292

<details open=""> perplexity : fix integer overflow (<a class="issue-link js-issue-link" href="https://github.com/ggml-org/llama.cpp/pull/23496">#23496</a>) Co-authored-by: Stanisław Szymczyk <a href="mailto:[email protected]">[email protected]</a> </details> <…
llama.cpp — Releases TIER_1 (SO) · github-actions[bot] · 2026-05-22 22:14

b9290

<details open=""> sycl : Level Zero detection in ggml_sycl_init (<a class="issue-link js-issue-link" href="https://github.com/ggml-org/llama.cpp/pull/23097">#23097</a>) <ul> <li> [SYCL] Centralize Level Zero detection in ggml_sycl_init </li> <li> use the same wor…
llama.cpp — Releases TIER_1 (SO) · github-actions[bot] · 2026-05-22 21:38

b9289

<details open=""> SYCL : gated_delta_net K>1 (<a class="issue-link js-issue-link" href="https://github.com/ggml-org/llama.cpp/pull/23174">#23174</a>) <ul> <li> sycl_gated_delta_net K>1 </li> <li> editor_config </li> </ul> </details> macOS/iOS…

COVERAGE [8]

RELATED ENTITIES

RELATED TOPICS