Zyphra's ZAYA1-8B model matches larger rivals with 700M active parameters

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Zyphra has released ZAYA1-8B, a reasoning-focused mixture-of-experts model with 700 million active parameters. The model was trained from scratch on an AMD compute platform and utilizes a novel four-stage reinforcement learning cascade. ZAYA1-8B demonstrates competitive performance on mathematics and coding benchmarks, even when compared to significantly larger models, by employing a reasoning-focused training methodology and an answer-preserving trimming scheme. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT This model's performance on reasoning benchmarks, particularly with a limited active parameter count, suggests potential for more efficient reasoning models.

RANK_REASON This is a technical report detailing a new model release from a non-frontier lab. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

COVERAGE [1]

arXiv cs.CL TIER_1 · Robert Washbourne, Rishi Iyer, Tomas Figliolia, Henry Zheng, Ryan Lorig-Roach, Sungyeon Yang, Pritish Yuvraj, Quentin Anthony, Yury Tokpanov, Xiao Yang, Ganesh Nanduru, Stephen Ebert, Praneeth Medepalli, Skyler Szot, Srivatsan Rajagopal, Alex Ong, Bhavana · 2026-05-08 04:00

ZAYA1-8B Technical Report

arXiv:2605.05365v1 Announce Type: cross Abstract: We present ZAYA1-8B, a reasoning-focused mixture-of-experts (MoE) model with 700M active and 8B total parameters, built on Zyphra's MoE++ architecture. ZAYA1-8B's core pretraining, midtraining, and supervised fine-tuning (SFT) wer…

COVERAGE [1]

ZAYA1-8B Technical Report

RELATED ENTITIES

RELATED TOPICS