English(EN) llama.cpp - how to free up even more space on your GPU

llama.cpp 用户分享 GPU 内存优化技巧

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-17 18:23

一位 Reddit 用户正在寻找优化 llama.cpp 框架内内存使用的方法，特别是在 GPU 卸载方面。他们分享了几个参数，如 `--no-mmproj-offload`、`--cache-type-k` 和 `--flash-attn`，这些参数有助于减少 VRAM 消耗。该用户正在寻求社区的额外技巧，以通过释放 GPU 内存来进一步增加上下文大小。 AI

影响用户正在分享优化本地 LLM 推理的技术，这有可能在消费级硬件上运行更大的模型或更大的上下文窗口。

排序理由用户为优化现有软件工具生成的技巧。

在 r/LocalLLaMA 阅读 →

基础设施

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/LocalLLaMA TIER_1 English(EN) · /u/imgroot9 · 2026-06-17 18:23

llama.cpp - 如何为您的GPU释放更多空间

<div class="md"><p>For the past week or two, llama.cpp has been working much better from the RAM usage prespective. I no longer see any memory leaks, and everything fits nicely on the GPU - my defaults are <strong>--n-gpu-layers 99 --no-mmap --mlock</strong> to avo…

报道来源 [1]

llama.cpp - 如何为您的GPU释放更多空间

相关实体

相关话题