Brief · PulseAugur

TOOL · r/LocalLLaMA English(EN) · 2h

Deepseek V4 flash performance on DGX Spark

A user has successfully configured Deepseek V4 Flash on a DGX Spark system, achieving a maximum context window of 1 million tokens in the KV cache. Performance tests show consistent throughput across various context lengths, with a notable anomaly at 32k tokens. The user reports that Deepseek V4 Flash outperforms other models like M2.7 and Stepfun 3.7 in high-context reasoning, though it lacks the world knowledge of denser models. AI

IMPACT Demonstrates high-context capabilities and performance tuning for large models on specialized hardware.

Kimi K2.6
Deepseek V4 Flash
DGX Spark
vLLM
Gemma4 27B
Qwen3.6 26B
Stepfun 3.7
local-inference-lab