PulseAugur
EN
LIVE 17:26:00

DeepSeek AI previews DeepSeek-V4 models with 1M token context

DeepSeek AI has released a preview of its DeepSeek-V4 series, featuring two Mixture-of-Experts (MoE) models: DeepSeek-V4-Pro and DeepSeek-V4-Flash. These models support an impressive one million token context length and incorporate architectural enhancements like a hybrid attention mechanism (CSA and HCA) for improved efficiency. The models also utilize Manifold-Constrained Hyper-Connections (mHC) for stability and the Muon optimizer for faster training. AI

IMPACT Sets a new benchmark for long-context LLMs, potentially driving competition in efficient context handling.

RANK_REASON Frontier-lab model release with system card. [lever_c_demoted from frontier_release: ic=1 ai=1.0]

Read on Hugging Face Trending Models →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

DeepSeek AI previews DeepSeek-V4 models with 1M token context

COVERAGE [1]

  1. Hugging Face Trending Models TIER_1 Nederlands(NL) · deepseek-ai ·

    deepseek-ai/DeepSeek-V4-Pro-DSpark

    text-generation · 0 downloads · 70 likes