English(EN) sycl : port multi-column MMVQ from CUDA backend (~45% speculative decoding speedup on Intel Arc) by masonmilby · Pull Request #21845 · ggml-org/llama.cpp

llama.cpp 增加 SYCL 后端以支持 Intel Arc GPU，提升速度

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-05 18:51

llama.cpp 项目已提交一个拉取请求，将多列 MMVQ（矩阵-矩阵向量量化）从 CUDA 后端移植到 SYCL。此次移植旨在提高 Intel Arc 显卡用户的性能，初步报告显示推测解码速度提升约 45%。建议拥有兼容 Intel 硬件的用户更新其 llama.cpp 版本以受益于此优化。 AI

影响增强了 Intel 硬件上的本地 LLM 推理性能，使其更易于访问。

排序理由这是对一个开源项目的代码贡献，它提高了特定用户群体的硬件兼容性和性能。

在 r/LocalLLaMA 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

llama.cpp 增加 SYCL 后端以支持 Intel Arc GPU，提升速度

报道来源 [1]

r/LocalLLaMA TIER_1 English(EN) · /u/pmttyji · 2026-06-05 18:51

sycl : port multi-column MMVQ from CUDA backend (~45% speculative decoding speedup on Intel Arc) by masonmilby · Pull Request #21845 · ggml-org/llama.cpp

<table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1txtuzk/sycl_port_multicolumn_mmvq_from_cuda_backend_45/"> <img alt="sycl : port multi-column MMVQ from CUDA backend (~45% speculative decoding speedup on Intel Arc) by masonmilby · Pull Request #21845 · ggml-…

报道来源 [1]

sycl : port multi-column MMVQ from CUDA backend (~45% speculative decoding speedup on Intel Arc) by masonmilby · Pull Request #21845 · ggml-org/llama.cpp

相关实体

相关话题