English(EN) ZAYA1-8B: a 760M-active MoE trained on AMD MI300x

Zyphra 发布 ZAYA1-8B MoE 模型，活跃参数不足 10 亿

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-22 03:28

Zyphra 发布了 ZAYA1-8B，一个拥有 84 亿参数的混合专家（Mixture-of-Experts）模型，每个 token 仅激活约 7.6 亿参数。该架构使其在数学和编码基准测试中能够达到与更大模型相媲美的性能，包括 Claude 4.5 Sonnet。该模型采用了压缩卷积注意力（Compressed Convolutional Attention）和基于 MLP 的专家选择路由器等架构改进，并在大量 AMD Instinct MI300x 节点集群上进行了训练。 AI

影响以显著减少的活跃参数实现了前沿水平的性能，可能降低先进模型的推理成本。

排序理由来自拥有新颖架构创新的实验室的模型发布。[lever_c_demoted from frontier_release: ic=1 ai=1.0]

在 dev.to — LLM tag 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

dev.to — LLM tag TIER_1 English(EN) · Thousand Miles AI · 2026-05-22 03:28

ZAYA1-8B：在 AMD MI300x 上训练的 760M 活跃 MoE 模型

<p>Zyphra released ZAYA1-8B on May 6, 2026. It's an 8.4B-parameter Mixture-of-Experts model where only about 760M parameters activate per token, and on the math and coding benchmarks Zyphra ran it sits next to models many times its size, including Claude 4.5 Sonnet and DeepSeek-R…

报道来源 [1]

ZAYA1-8B：在 AMD MI300x 上训练的 760M 活跃 MoE 模型

相关实体

相关话题