DeepSeek has developed a custom kernel stack, DeepGEMM and TileLang, which not only matches but surpasses the performance of NVIDIA's cuBLAS. This custom implementation achieves bitwise determinism and batch invariance, addressing issues with non-deterministic outputs common in other workload-balancing strategies like splitK or split-KV. The innovation lies in their approach to floating-point math, ensuring consistent results for debugging and training. AI
影响 DeepSeek's custom kernel stack offers a potential performance advantage over standard libraries, which could influence future AI infrastructure development and optimization strategies.
排序理由 The cluster details a technical innovation in custom kernel development for AI model training, including performance benchmarks and technical explanations, which aligns with research-level disclosure.
AI 生成摘要 · Google Gemini · 来自 4 个来源。 我们如何撰写摘要 →