New system KernelPro autonomously optimizes GPU kernel code using LLMs

By PulseAugur Editorial · [1 sources] · 2026-06-26 04:00

Researchers have developed KernelPro, an autonomous system designed to optimize GPU kernel code for large language models. This system integrates LLM code generation with hardware profiler feedback and specialized analysis tools to iteratively improve performance. KernelPro introduces novel components such as a semantic feedback operator for actionable guidance, a two-stage tool invocation architecture for efficient bottleneck analysis, and direct CuTe source-level code generation. The system has demonstrated significant speedups on benchmark datasets and has shown improvements over expert-optimized kernels, while also focusing on energy efficiency. AI

IMPACT This system could significantly accelerate the development and deployment of high-performance AI models by optimizing the underlying GPU computations.

RANK_REASON The cluster describes a research paper detailing a new system for optimizing GPU kernel code. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

paper
infra

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New system KernelPro autonomously optimizes GPU kernel code using LLMs

COVERAGE [1]

arXiv cs.LG TIER_1 English(EN) · Jiading Gai, Shuai Zhang, Kaj Bostrom, Jin Huang, Vihang Patil, Haoyang Fang, Bernie Wang, Huzefa Rangwala, George Karypis · 2026-06-26 04:00

Optimizing CUDA like a Human: Micro-Profiling Tools as Expert Surrogates for LLM-Based GPU Kernel Optimization

arXiv:2606.26453v1 Announce Type: new Abstract: We present KernelPro, a closed-loop multi-agent system that automatically generates, profiles, and iteratively optimizes GPU kernel code by integrating large language model (LLM) code generation with hardware profiler feedback and p…

COVERAGE [1]

Optimizing CUDA like a Human: Micro-Profiling Tools as Expert Surrogates for LLM-Based GPU Kernel Optimization

RELATED ENTITIES

RELATED TOPICS