English(EN) Anyone running Deepseek v4 Flash with MoE offload?

Reddit 上讨论 Deepseek V4 Flash 模型在 MoE 卸载下的性能

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-25 03:40

r/LocalLLaMA subreddit 上的一位用户正在询问运行 Deepseek V4 Flash 模型的情况，特别是关于其在混合专家（MoE）卸载下的性能。该用户引用了几个与 Deepseek V4 模型分支和修改相关的 GitHub 仓库和 Hugging Face 页面，包括 'huihui-ai' 和 'Fringe210' 的努力，旨在改进张量并行和 CUDA 兼容性。讨论的重点是围绕将大型模型装入可用显存的技术挑战，特别是 KV 缓存，并探索不同的实现以获得最佳性能。 AI

影响技术用户正在探索优化配置以在本地运行大型语言模型。

排序理由在用户论坛上讨论使用特定模型的特定技术配置运行情况。

在 r/LocalLLaMA 阅读 →

基础设施

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

Reddit 上讨论 Deepseek V4 Flash 模型在 MoE 卸载下的性能

报道来源 [1]

r/LocalLLaMA TIER_1 English(EN) · /u/fragment_me · 2026-06-25 03:40

Anyone running Deepseek v4 Flash with MoE offload?

<div class="md"><p>I saw the DS4 repo and the last time I tried it I was just short of 5-10GB of VRAM to fit the model I wanted in VRAM with the KV cache.</p> <p>There are also these repos that caught my eye that I saw on the huihui-ai hugging face page - <a href="…

报道来源 [1]

Anyone running Deepseek v4 Flash with MoE offload?

相关实体

相关话题