Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains
JetBrains has released Mellum2, an open-source 12-billion parameter Mixture-of-Experts (MoE) model optimized for efficient inference in text and code tasks. This model activates only a fraction of its parameters per token, enabling faster, lower-latency operations suitable for routing, RAG pipelines, and sub-agent tasks within larger AI systems. Several research papers also explore advancements in MoE architectures, including efficient serving techniques like CRAFT, novel aggregation methods like DAG-MoE, adaptive gating with Kappa-SwiGLU, and probabilistic routing with ProbMoE, alongside game-theory inspired expert merging strategies. AI
IMPACT Mellum2's efficiency and specialized design offer a faster, cheaper alternative for specific tasks within larger AI systems, potentially accelerating the adoption of modular AI architectures.