English(EN) Are there more easy techniques than --tensor-split to fill VRAM in llama.cpp?

LLaMA.cpp 用户寻求超越 tensor-split 的 VRAM 优化方法

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-29 22:02

一位 Reddit r/LocalLLaMA 版块的用户正在寻找更有效的方法来优化 llama.cpp 的 VRAM 使用，特别是针对跨多个 GPU 的专家混合（MoE）模型。他们目前依赖于手动调整 `--ngl` 和 `--tensor-split` 参数，这非常耗时且会留下未使用的 VRAM。用户正在询问除了 `--tensor-split` 之外的更高级技术，以最大限度地提高 VRAM 利用率，从而获得更好的速度和模型加载。 AI

影响用户正在探索最大化硬件效率以在本地运行大型模型的方法。

排序理由用户讨论优化现有工具，并非新发布或重大进展。

在 r/LocalLLaMA 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/LocalLLaMA TIER_1 English(EN) · /u/GregoryfromtheHood · 2026-05-29 22:02

llama.cpp 中除了 --tensor-split 之外，还有更多填充 VRAM 的简单方法吗？

<div class="md"><p>Using 4 GPUs with llama.cpp, with MoE models mainly, I try to fit as much in VRAM as I can. --fit does a terrible job and always causes oom by trying to put way too much on 1 gpu or stupid things like that, so I do --ngl 999 and --n-cpu-moe and a…

报道来源 [1]

llama.cpp 中除了 --tensor-split 之外，还有更多填充 VRAM 的简单方法吗？

相关实体

相关话题