Brief · PulseAugur

TOOL · arXiv cs.LG English(EN) · 7h

CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe

Researchers have developed CuTeGen, a new framework designed to automate the creation and optimization of high-performance GPU kernels. This agentic system employs a structured workflow of generating, testing, and refining kernels, specifically targeting the CuTe abstraction layer. By delaying low-level performance feedback until the kernel's high-level structure is stable, CuTeGen aims to overcome the limitations of previous LLM-based approaches. On the KernelBench benchmark, CuTeGen demonstrated an average speedup of 1.71x over PyTorch and surpassed a prior agentic baseline. AI

IMPACT Automates complex GPU kernel development, potentially accelerating ML system performance and reducing reliance on expert programmers.

LLM
PyTorch
KernelBench
CuTeGen
CuTe
Tara Saba
CudaForge