Brief

last 24h

[3/3] 223 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.LG English(EN) · 7h

CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe

Researchers have developed CuTeGen, a new framework designed to automate the creation and optimization of high-performance GPU kernels. This agentic system employs a structured workflow of generating, testing, and refining kernels, specifically targeting the CuTe abstraction layer. By delaying low-level performance feedback until the kernel's high-level structure is stable, CuTeGen aims to overcome the limitations of previous LLM-based approaches. On the KernelBench benchmark, CuTeGen demonstrated an average speedup of 1.71x over PyTorch and surpassed a prior agentic baseline. AI

IMPACT Automates complex GPU kernel development, potentially accelerating ML system performance and reducing reliance on expert programmers.
- KernelBench
- CuTeGen
- CuTe
- PyTorch
- CudaForge
- LLM
- Tara Saba
RESEARCH · arXiv cs.LG English(EN) · 1d · [2 sources]

MusaCoder: Native GPU Kernel Generation with Full-Stack Training on Moore Threads GPU

Researchers have developed MusaCoder, a novel framework for generating native GPU kernels, which are essential for efficient low-level code execution. This system employs a full-stack training approach, integrating data synthesis, rejection fine-tuning, and reinforcement learning with a specialized verification environment called MooreEval. MusaCoder introduces several techniques to stabilize the reinforcement learning process, leading to improved correctness and speedup compared to existing models. The framework demonstrates strong performance, with its larger version setting a new state-of-the-art for native GPU kernel generation. AI

IMPACT Establishes a new state-of-the-art in native GPU kernel generation, potentially accelerating AI development on emerging hardware.
TOOL · arXiv cs.LG English(EN) · 4d

Kernel Foundry: A Diagnosis-driven Evolutionary Kernel Optimizer with Multi-Experts

Researchers have developed Kernel Foundry, an evolutionary framework designed to optimize GPU kernels for both correctness and performance. This system leverages large language models for initial code generation, then refines the kernels through a multi-expert evolutionary search guided by diagnostic feedback. An experience library stores reusable optimization knowledge to enhance future kernel generation, with mechanisms in place to prevent incorrect computations. AI

IMPACT Introduces a novel approach to GPU kernel optimization, potentially improving performance and correctness for AI workloads.
- LLMs
- GPU
- KernelBench
- Kernel Foundry

Brief

CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe

MusaCoder: Native GPU Kernel Generation with Full-Stack Training on Moore Threads GPU

Kernel Foundry: A Diagnosis-driven Evolutionary Kernel Optimizer with Multi-Experts