ZAYA1-8B: a 760M-active MoE trained on AMD MI300x
Zyphra has released ZAYA1-8B, an 8.4 billion parameter Mixture-of-Experts model that only activates approximately 760 million parameters per token. This architecture allows it to achieve performance comparable to much larger models on math and coding benchmarks, including Claude 4.5 Sonnet. The model incorporates architectural changes like Compressed Convolutional Attention and an MLP-based router for expert selection, and was trained on a large cluster of AMD Instinct MI300x nodes. AI
IMPACT Achieves frontier-level performance with significantly reduced active parameters, potentially lowering inference costs for advanced models.