DSV4 blog: https://t.co/T1mlIq1yrZ
Together AI has released DeepSeek V4 Pro, an open-source model featuring a significantly different KV cache architecture compared to previous DeepSeek models. This new architecture incorporates sliding window attention, an indexer, and compression states to enhance cache reuse. To optimize performance, Together AI implemented fused attention setup kernels, faster sparse attention kernels, improved kernel overlap, and graph-level optimizations. AI
IMPACT This release introduces architectural innovations in KV caching, potentially influencing future model development and optimization strategies.