PulseAugur
EN
LIVE 14:58:46

MoonMath AI open-sources HIP attention kernel for AMD MI300X, beating AITER v3

MoonMath AI has open-sourced a new bf16 forward attention kernel for AMD's MI300X GPU, written in HIP. This kernel reportedly outperforms AMD's own AITER v3 across various configurations, achieving up to a 1.26x speedup. The performance gains are attributed to strategic memory placement and a novel one-instruction assembly wrapper technique that allows for precise control over operations while leveraging compiler optimizations for register allocation. This advancement has already been integrated into SGLang to accelerate video diffusion models like Wan2.1. AI

IMPACT This optimized kernel could accelerate AI inference on AMD hardware, potentially lowering costs and increasing adoption.

RANK_REASON Open-source release of a specialized GPU kernel with performance benchmarks. [lever_c_demoted from research: ic=1 ai=0.7]

Read on MarkTechPost →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

MoonMath AI open-sources HIP attention kernel for AMD MI300X, beating AITER v3

COVERAGE [1]

  1. MarkTechPost TIER_1 English(EN) · Asif Razzaq ·

    MoonMath AI Open-Sources a HIP Attention Kernel for AMD MI300X That Beats AITER v3 on Every Shape and Rounding Mode

    <p>The HIP kernel uses one-instruction asm wrappers and an eight-wave pipeline to outperform AMD's AITER v3 on MI300X.</p> <p>The post <a href="https://www.marktechpost.com/2026/06/22/moonmath-ai-open-sources-a-hip-attention-kernel-for-amd-mi300x-that-beats-aiter-v3-on-every-shap…