Arcee AI has released its open-weight Trinity Large LLM, a 400 billion parameter Mixture-of-Experts model with 13 billion active parameters. The model incorporates several architectural innovations, including alternating local and global attention layers with a 3:1 ratio and a 4096 token window size. It also features QK-Norm for training stability, no positional embeddings in global attention layers, and a gated attention mechanism to improve generalization and mitigate attention sinks. Arcee AI also released smaller variants, Trinity Mini and Trinity Nano, alongside a technical report detailing the architecture. AI
RANK_REASON Release of an open-weight LLM with detailed architectural information, but not from a top-tier frontier lab.
Read on Ahead of AI (Sebastian Raschka) →
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →