English(EN) DeepSeek-V2 beats Mixtral 8x22B with >160 experts at HALF the cost

DeepSeek-V2 以更低的成本和更多的专家超越 Mixtral 8x22B

作者 PulseAugur 编辑部 · [1 个来源] · 2024-05-06 23:37

DeepSeek AI 的新模型 DeepSeek-V2 在计算资源消耗显著少于 Mixtral 8x22B 的情况下，展现出更优越的性能。该先进模型采用了超过 160 个专家，使其能够以其前代模型一半的运营成本取得更好的结果。这一发展标志着高效大型语言模型设计方面迈出了重要一步。 AI

排序理由重要 AI 实验室发布的新模型，在关键基准测试中超越现有模型。

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

Smol AINews TIER_1 English(EN) · 2024-05-06 23:37

DeepSeek-V2 beats Mixtral 8x22B with >160 experts at HALF the cost

**DeepSeek V2** introduces a new state-of-the-art MoE model with **236B parameters** and a novel Multi-Head Latent Attention mechanism, achieving faster inference and surpassing GPT-4 on AlignBench. **Llama 3 120B** shows strong creative writing skills, while Microsoft is reporte…