PulseAugur
EN
LIVE 11:51:18

DeepSeek unveils V4 models with 1M token context and MoE architecture

DeepSeek has released a preview of its DeepSeek-V4 series of Mixture-of-Experts (MoE) language models, featuring DeepSeek-V4-Pro (1.6T parameters) and DeepSeek-V4-Flash (284B parameters). Both models support an unprecedented one million token context length, achieved through a hybrid attention architecture and an optimized residual connection method. Trained on over 32 trillion tokens, these models demonstrate significant efficiency gains in long-context scenarios, with DeepSeek-V4-Pro requiring substantially less FLOPs and KV cache for inference compared to its predecessor. AI

IMPACT Sets new SOTA for open models in long-context reasoning and efficiency, potentially enabling new classes of AI applications.

RANK_REASON Frontier-lab model release with system card. [lever_c_demoted from frontier_release: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

DeepSeek unveils V4 models with 1M token context and MoE architecture

COVERAGE [1]

  1. arXiv cs.AI TIER_1 English(EN) · DeepSeek-AI, Anyi Xu, Bangcai Lin, Bing Xue, Bingxuan Wang, Bingzheng Xu, Bochao Wu, Bowei Zhang, Chaofan Lin, Chen Dong, Chenchen Ling, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chengyu Hou, Chenhao Xu, Chenze Shao, Chong Ruan, Conner Sun, Damai Dai, Da… ·

    DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

    arXiv:2606.19348v1 Announce Type: cross Abstract: We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models -- DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) -…