Open-source framework accelerates LLM training with MoE/MoD

By PulseAugur Editorial · [1 sources] · 2026-06-07 20:17

A developer has created an open-source PyTorch framework designed for training large language models with Mixture of Experts (MoE) and Mixture of Depths (MoD) architectures. The framework incorporates custom CUDA kernels that offer significant speedups over standard PyTorch, along with an adaptive training orchestrator that automatically manages parameters like learning rate and expert pruning. It supports models ranging from 500,000 to 300 billion parameters and includes compatibility for Apple Silicon. AI

IMPACT This framework could enable more efficient training of large language models, potentially lowering the barrier to entry for developing advanced AI.

RANK_REASON This is an open-source release of a framework for training LLMs, which falls under research. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Open-source framework accelerates LLM training with MoE/MoD

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/RefrigeratorCalm9701 · 2026-06-07 20:17

I built a PyTorch MoE/MoD training framework with custom CUDA kernels [Apache 2.0]

<div class="md">PyTorch framework for training transformer LLMs with MoE and MoD architecture support, custom CUDA kernels, and DeepSpeed integration. Key things it does: - Custom CUDA kernels for RMSNorm, RoPE, SwiGLU, MoE routing. 2 to 7x faster …

COVERAGE [1]

I built a PyTorch MoE/MoD training framework with custom CUDA kernels [Apache 2.0]

RELATED ENTITIES

RELATED TOPICS