PulseAugur / Brief
EN
LIVE 19:55:33

Brief

last 24h
[1/1] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. Cross-Platform Fused MoE Dispatch in Triton: Portable Expert Routing Without CUDA [R]

    Researchers have developed TritonMoE, a new inference kernel for Mixture-of-Experts (MoE) models written entirely in OpenAI's Triton language. This kernel achieves cross-platform compatibility, running on both NVIDIA and AMD hardware without vendor-specific code. It demonstrates significant performance gains, outperforming existing methods like Megablocks in throughput for shorter token sequences, though it faces limitations with very long contexts or a high number of experts. AI

    IMPACT Enables more efficient and portable inference for Mixture-of-Experts models across different hardware architectures.