Brief · PulseAugur

TOOL · r/MachineLearning English(EN) · 2w

Cross-Platform Fused MoE Dispatch in Triton: Portable Expert Routing Without CUDA [R]

Researchers have developed TritonMoE, a new inference kernel for Mixture-of-Experts (MoE) models written entirely in OpenAI's Triton language. This kernel achieves cross-platform compatibility, running on both NVIDIA and AMD hardware without vendor-specific code. It demonstrates significant performance gains, outperforming existing methods like Megablocks in throughput for shorter token sequences, though it faces limitations with very long contexts or a high number of experts. AI

IMPACT Enables more efficient and portable inference for Mixture-of-Experts models across different hardware architectures.

NVIDIA
AMD
A100
MI300X
OpenAI Triton
TritonMoE