PulseAugur
EN
LIVE 02:40:06

Together AI releases DeepSeek V4 Pro with novel KV cache architecture

Together AI has released DeepSeek V4 Pro, an open-source model featuring a significantly different KV cache architecture compared to previous DeepSeek models. This new architecture incorporates sliding window attention, an indexer, and compression states to enhance cache reuse. To optimize performance, Together AI implemented fused attention setup kernels, faster sparse attention kernels, improved kernel overlap, and graph-level optimizations. AI

IMPACT This release introduces architectural innovations in KV caching, potentially influencing future model development and optimization strategies.

RANK_REASON Open-source model release from a recognized AI lab. [lever_c_demoted from frontier_release: ic=2 ai=1.0]

Read on X — Together (inference / OSS) →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Together AI releases DeepSeek V4 Pro with novel KV cache architecture

COVERAGE [2]

  1. X — Together (inference / OSS) TIER_1 Norsk(NO) · togethercompute ·

    DSV4 blog: https://t.co/T1mlIq1yrZ

    DSV4 blog: https://t.co/T1mlIq1yrZ DSV4 tech talk: https://t.co/sPiJ3lo6Ry Try DSV4: https://t.co/wBC2ldzzyD

  2. X — Together (inference / OSS) TIER_1 English(EN) · togethercompute ·

    DeepSeek V4 Pro has a fundamentally different KV cache than any prior DeepSeek model. Sliding window attention, an indexer, and compression states all need to b

    DeepSeek V4 Pro has a fundamentally different KV cache than any prior DeepSeek model. Sliding window attention, an indexer, and compression states all need to be stored correctly to get good cache reuse. To get it to run fast we didn't just rewrite the KV cache from scratch, we …