English(EN) I can fit 28% more context after building llama.cpp with OpenBLAS. Huh?

llama.cpp 通过 OpenBLAS 构建获得 28% 的上下文容量提升

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-04 16:58

Reddit r/LocalLLaMA 版块的一名用户发现，在编译 llama.cpp 软件时，除了 Vulkan 支持外，增加 OpenBLAS 支持可以显著增加上下文窗口大小。在使用 Qwen 3.6 27B 模型时，上下文窗口从大约 87,808 个 token 扩展到 112,896 个 token。该用户正在调查这是否是预期行为、一个 bug 或一个异常。 AI

影响为本地 LLM 部署提高了上下文窗口效率的潜力。

排序理由用户发现的开源 LLM 推理软件优化。 [lever_c_demoted from research: ic=1 ai=0.7]

在 r/LocalLLaMA 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/LocalLLaMA TIER_1 English(EN) · /u/Warrenio · 2026-06-04 16:58

I can fit 28% more context after building llama.cpp with OpenBLAS. Huh?

<div class="md"><p>I've noticed a weird difference when building llama.cpp with the Vulkan and OpenBLAS backends vs. building with the Vulkan backend only. It seems like llama.cpp can fit significantly more context in VRAM when built with OpenBLAS than when built w…

报道来源 [1]

I can fit 28% more context after building llama.cpp with OpenBLAS. Huh?

相关实体

相关话题