A new fused Mixture-of-Experts (MoE) dispatch kernel, written entirely in Triton, achieves 89-131% of the performance of Stanford's Megablocks library. This kernel notably runs on AMD MI300X hardware without any code modifications. The primary optimization involves fusing gate and projection operations, which significantly reduces global memory traffic by keeping intermediate results in registers. AI
IMPACT Enables more efficient MoE model inference, potentially improving performance on diverse hardware including AMD GPUs.
RANK_REASON The cluster describes a new kernel implementation and benchmark results for a specific AI model architecture, which falls under research. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →