Together AI has released DeepSeek V4 Pro, an open-source model featuring a significantly different KV cache architecture compared to previous DeepSeek models. This new architecture incorporates sliding window attention, an indexer, and compression states to enhance cache reuse. To optimize performance, Together AI implemented fused attention setup kernels, faster sparse attention kernels, improved kernel overlap, and graph-level optimizations. AI
IMPACT This release introduces architectural innovations in KV caching, potentially influencing future model development and optimization strategies.
RANK_REASON Open-source model release from a recognized AI lab. [lever_c_demoted from frontier_release: ic=2 ai=1.0]
Read on X — Together (inference / OSS) →
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →