EvoTensile optimizes AMD Tensile GEMM kernels with evolutionary algorithms

By PulseAugur Editorial · [1 sources] · 2026-06-19 07:32

A new tool called EvoTensile has been developed to optimize the performance of AMD Tensile GEMM kernels, which are crucial for AI model training and inference. EvoTensile utilizes evolutionary algorithms to search for the best parameters, leading to significant speed improvements. For instance, on AMD's Strix Halo (gfx1151) hardware, EvoTensile has tuned NT layout kernels, boosting performance from 20 to 40 TFLOPS, approaching the theoretical roofline. The developer hopes this tool will be integrated into mainstream ROCm libraries for broader adoption. AI

IMPACT Optimized kernels can lead to faster AI model training and inference, potentially reducing computational costs and accelerating development cycles.

RANK_REASON The item describes a new tool and method for optimizing hardware kernels, which is a research-level development in AI infrastructure. [lever_c_demoted from research: ic=1 ai=0.7]

Read on r/StableDiffusion →

infra

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

EvoTensile optimizes AMD Tensile GEMM kernels with evolutionary algorithms

COVERAGE [1]

r/StableDiffusion TIER_2 English(EN) · /u/woct0rdho · 2026-06-19 07:32

EvoTensile: Evolutionary algorithms for AMD Tensile GEMM kernel tuning

<div class="md"><p>There has been an effort to tune kernels in hipBLASLt so the most basic matmuls can run faster. It's known that on Strix Halo (gfx1151), GEMM with NN and TN input layouts (used in inference) are already well-tuned, while NT and TT layouts (used i…

COVERAGE [1]

EvoTensile: Evolutionary algorithms for AMD Tensile GEMM kernel tuning

RELATED ENTITIES

RELATED TOPICS