DeepSeek v3, a new 671B parameter Mixture-of-Experts model, has been released and is currently the top-performing open-weights model available. Serving such large models presents significant challenges, but inference startup Baseten has successfully deployed DeepSeek v3 using NVIDIA H200 GPUs and the SGLang framework. This deployment highlights the critical factors for running mission-critical AI inference at scale, which include model-level performance, efficient serving infrastructure, and robust orchestration. AI
RANK_REASON New open-weights model release from a significant lab (DeepSeek) that achieves top benchmark performance.
Read on Latent Space Podcast →
- Amir Haghighat
- Baseten
- DeepSeek v3
- Gemini 2
- Hunyuan-Large
- MiniMax-Text
- Mixture-of-Experts
- NVIDIA H200
- SGLang
- Tencent
- X.ai
- Yineng Zhang
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →