LLM framework automates GPU kernel generation, outperforming PyTorch

By PulseAugur Editorial · [1 sources] · 2026-06-05 04:00

Researchers have developed CuTeGen, a new framework designed to automate the creation and optimization of high-performance GPU kernels. This agentic system employs a structured workflow of generating, testing, and refining kernels, specifically targeting the CuTe abstraction layer. By delaying low-level performance feedback until the kernel's high-level structure is stable, CuTeGen aims to overcome the limitations of previous LLM-based approaches. On the KernelBench benchmark, CuTeGen demonstrated an average speedup of 1.71x over PyTorch and surpassed a prior agentic baseline. AI

IMPACT Automates complex GPU kernel development, potentially accelerating ML system performance and reducing reliance on expert programmers.

RANK_REASON The cluster contains an academic paper detailing a new framework for GPU kernel generation. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

paper
infra

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

LLM framework automates GPU kernel generation, outperforming PyTorch

COVERAGE [1]

arXiv cs.LG TIER_1 English(EN) · Tara Saba, Zhiyang Chen, Jikai Jason Li, Anne Ouyang, Xujie Si, Fan Long · 2026-06-05 04:00

CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe

arXiv:2604.01489v2 Announce Type: replace Abstract: High-performance GPU kernels are critical to modern machine learning systems, yet developing them remains a manual, expert-driven process. Recent work has explored using LLMs to automate kernel generation, but generated kernels …

COVERAGE [1]

CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe

RELATED ENTITIES

RELATED TOPICS