English(EN) Dual DGX Sparks- 40tk/s single 1M ; 350 tk/s agg. - Deepseek V4 Flash (vs RTX Pro 6000 vs Mac M2 Ultra 192)

DeepSeek V4-Flash 在双DGX Sparks上达到40 Ttk/s

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-14 09:07

一位用户分享了在双DGX Sparks硬件上运行DeepSeek V4-Flash模型的配置和基准测试。该设置在FP8精度下实现了约每秒40万亿token（tera-tokens per second）的吞吐量，并在处理具有256k上下文窗口的多个请求时，聚合吞吐量可达每秒350万亿token。此性能与Nvidia RTX Pro 6000和Mac M2 Ultra系统进行了比较，突显了双DGX设置在大模型推理方面的效率。 AI

影响展示了在可访问硬件上运行大型模型的高吞吐量推理能力，可能降低高级AI应用的门槛。

排序理由用户生成的基准测试和配置，用于在消费级/专业级硬件上运行特定LLM。[lever_c_demoted from research: ic=1 ai=0.7]

在 r/LocalLLaMA 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

DeepSeek V4-Flash 在双DGX Sparks上达到40 Ttk/s

报道来源 [1]

r/LocalLLaMA TIER_1 English(EN) · /u/elsung · 2026-06-14 09:07

双DGX 40tk/s 单个1M；350 tk/s 总计 - Deepseek V4 Flash (对比 RTX Pro 6000 vs Mac M2 Ultra 192)

<div class="md"><p>First of all shout out to Aiden/Antirez & geniuses at the Nvidia community threads. I'm merely claude-vibing off of their works.</p> <p>That a said, i thought i'd share recipes & learnings & benchmarks so far on running big MOE models…

报道来源 [1]

双DGX 40tk/s 单个1M；350 tk/s 总计 - Deepseek V4 Flash (对比 RTX Pro 6000 vs Mac M2 Ultra 192)

相关实体

相关话题