English(EN) I forked ik_llama.cpp and added a "--numa mirror" mode to maximize performance on multi-socket CPU systems. Just sharing and looking for testers!

新的 "--numa mirror" 模式提升 CPU 推理性能

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-21 17:37

一位开发者 fork 了 ik_llama.cpp 项目，引入了一个新的 "--numa mirror" 模式，旨在提升多路 CPU 系统的性能。该模式通过为每个 CPU 插槽创建模型权重和 KV 缓存的副本，解决了 CPU 访问非本地内存时产生的显著性能损失问题。虽然这需要更多的 RAM，但它允许利用所有插槽上的所有 CPU 核心来加速推理，这与仅限于单个插槽使用的 "--numa isolate" 模式不同。开发者正在寻找测试者来评估在各种硬件配置上的性能提升。 AI

影响这项优化可以提高多路 CPU 系统用户的推理速度，可能使本地 LLM 部署更有效率。

排序理由这是对现有项目的一个 fork，增加了一个用于性能优化的新功能，而不是一个新发布或重要的行业事件。

在 r/LocalLLaMA 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/LocalLLaMA TIER_1 English(EN) · /u/_TheWolfOfWalmart_ · 2026-06-21 17:37

I forked ik_llama.cpp and added a "--numa mirror" mode to maximize performance on multi-socket CPU systems. Just sharing and looking for testers!

<div class="md">GitHub: <a href="https://github.com/mikechambers84/ik_llama.cpp/tree/numa-mirror">https://github.com/mikechambers84/ik_llama.cpp/tree/numa-mirror</a> Be sure to checkout the <code>numa-mirror</code> branch. Sharing …

报道来源 [1]

I forked ik_llama.cpp and added a "--numa mirror" mode to maximize performance on multi-socket CPU systems. Just sharing and looking for testers!

相关话题