English(EN) PSA: Test your "threads" argument in llama.cpp (+80% performance in my case)

llama.cpp 性能通过优化线程数提升 80%

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-12 00:01

Reddit r/LocalLLaMA 子版块的一位用户通过调整 `--threads` 参数，在 llama.cpp 推理引擎中发现性能显著提升。最初认为在混合 CPU 设置中，将线程数限制为性能核心的数量是最优的。然而，使用 Gemma 4 26B A4B QAT 模型进行测试表明，在拥有 18 个核心（6 个性能核心，12 个效率核心）的 CPU 上将线程数增加到 16，性能提升了约 80%。这一发现表明，用户应该尝试超出性能核心数量的线程数，以最大化推理速度，尤其是在 CPU 或混合 CPU/GPU 设置中。 AI

影响优化线程数可以为本地 LLM 推理带来显著的性能提升，有可能使更大模型在消费级硬件上更容易访问。

排序理由用户发现的开源推理引擎优化。

在 r/LocalLLaMA 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/LocalLLaMA TIER_1 English(EN) · /u/AXYZE8 · 2026-06-12 00:01

PSA：在 llama.cpp 中测试你的“threads”参数（我的情况性能提升 80% 以上）

<div class="md"><p>When GPT-OSS 120B has released last year I played around and tried to maximize it's performance. One thing that many people pointed out was that for hybrid CPU (Performance + Efficiency cores) you should use only P-cores with "--threads&quot…

报道来源 [1]

PSA：在 llama.cpp 中测试你的“threads”参数（我的情况性能提升 80% 以上）

相关实体

相关话题