PulseAugur / Brief
EN
LIVE 14:15:42

Brief

last 24h
[1/1] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. EvoTensile: Evolutionary algorithms for AMD Tensile GEMM kernel tuning

    A new tool called EvoTensile has been developed to optimize the performance of AMD Tensile GEMM kernels, which are crucial for AI model training and inference. EvoTensile utilizes evolutionary algorithms to search for the best parameters, leading to significant speed improvements. For instance, on AMD's Strix Halo (gfx1151) hardware, EvoTensile has tuned NT layout kernels, boosting performance from 20 to 40 TFLOPS, approaching the theoretical roofline. The developer hopes this tool will be integrated into mainstream ROCm libraries for broader adoption. AI

    EvoTensile: Evolutionary algorithms for AMD Tensile GEMM kernel tuning

    IMPACT Optimized kernels can lead to faster AI model training and inference, potentially reducing computational costs and accelerating development cycles.