significant · [1 source] · 2026-05-22 03:28

Zyphra releases ZAYA1-8B MoE with sub-billion active parameters

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Zyphra has released ZAYA1-8B, an 8.4 billion parameter Mixture-of-Experts model that only activates approximately 760 million parameters per token. This architecture allows it to achieve performance comparable to much larger models on math and coding benchmarks, including Claude 4.5 Sonnet. The model incorporates architectural changes like Compressed Convolutional Attention and an MLP-based router for expert selection, and was trained on a large cluster of AMD Instinct MI300x nodes. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Achieves frontier-level performance with significantly reduced active parameters, potentially lowering inference costs for advanced models.

RANK_REASON Model release from a lab with novel architectural innovations. [lever_c_demoted from frontier_release: ic=1 ai=1.0]

Read on dev.to — LLM tag →

COVERAGE [1]

dev.to — LLM tag TIER_1 · Thousand Miles AI · 2026-05-22 03:28

ZAYA1-8B: a 760M-active MoE trained on AMD MI300x

<p>Zyphra released ZAYA1-8B on May 6, 2026. It's an 8.4B-parameter Mixture-of-Experts model where only about 760M parameters activate per token, and on the math and coding benchmarks Zyphra ran it sits next to models many times its size, including Claude 4.5 Sonnet and DeepSeek-R…

COVERAGE [1]

ZAYA1-8B: a 760M-active MoE trained on AMD MI300x

RELATED ENTITIES

RELATED TOPICS