New DTop-p MoE offers dynamic routing for efficient foundation model training

By PulseAugur Editorial · [1 sources] · 2026-06-01 04:00

Researchers have introduced DTop-p MoE, a novel routing mechanism for sparse Mixture-of-Experts (MoE) architectures used in foundation model pre-training. This method dynamically adjusts the Top-p probability threshold using a Proportional-Integral controller and layer-wise expert selection under a global sparsity constraint. Experiments show DTop-p MoE outperforms standard Top-k and fixed Top-p methods in Large Language Models and Diffusion Transformers, while maintaining comparable computational costs. AI

IMPACT Introduces a more efficient routing mechanism for MoE architectures, potentially improving training scalability and performance for large models.

RANK_REASON The cluster contains a research paper detailing a new method for foundation model pre-training. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Can Jin, Hongwu Peng, Mingcan Xiang, Qixin Zhang, Xiangchi Yuan, Amit Hasan, Ohiremen Dibua, Yifan Gong, Yan Kang, Dimitris N. Metaxas · 2026-06-01 04:00

DTop-p MoE: Sparsity-Controlled Dynamic Top-p MoE for Foundation Model Pre-training

arXiv:2512.13996v2 Announce Type: replace Abstract: Sparse Mixture-of-Experts architectures are essential for scaling model capacity efficiently, yet the standard Top-$k$ routing imposes a rigid sparsity pattern that ignores the intrinsic variance in token difficulty and layer-sp…

COVERAGE [1]

DTop-p MoE: Sparsity-Controlled Dynamic Top-p MoE for Foundation Model Pre-training

RELATED ENTITIES

RELATED TOPICS