Brief

last 24h

[2/2] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · Lobsters — AI tag English(EN) · 4d · [3 sources]

Dissecting ThunderKittens, anatomy of a compact DSL for high-performance AI kernels

A new article details ThunderKittens, a compact domain-specific language (DSL) developed at Stanford's Hazy Research Lab for creating high-performance AI kernels. The DSL aims to strike a balance between research productivity and hardware efficiency by abstracting repetitive GPU programming tasks like tile layouts and memory allocation. This allows developers to maintain close reasoning about data movement and scheduling while still enabling performance optimization for modern AI workloads on hardware like NVIDIA's Hopper and Blackwell architectures. AI

IMPACT Enables more efficient AI model training and inference by optimizing low-level GPU kernel performance.
- NVIDIA
- AI
- Stanford
- FlashAttention-2
- Hopper
- PyTorch
- CUDA
- GPU
- Blackwell
- Triton
- Hazy Research Lab
- ThunderKittens
TOOL · Together AI blog English(EN) · 1mo

Inside the Together AI kernels team

The Together AI kernels team, including researchers Dan Fu and Tri Dao, developed FlashAttention, a software layer that significantly optimizes GPU performance for AI models. This breakthrough, achieved by applying database system principles to GPU memory movement, resulted in 2-3x speedups, challenging the notion that transformer attention was already fully optimized. The team's subsequent work, including the ThunderKittens library, aims to accelerate kernel development for new hardware like NVIDIA's Blackwell GPUs, addressing the critical software-hardware gap in AI infrastructure. AI

IMPACT Optimizes AI inference and training by bridging the software-hardware gap, potentially lowering costs and improving responsiveness.
- NVIDIA
- Stanford
- Together AI
- Andrej Karpathy
- Tesla
- GPU
- FlashAttention
- ThunderKittens
- Tri Dao
- Dan Fu

Brief

Dissecting ThunderKittens, anatomy of a compact DSL for high-performance AI kernels

Inside the Together AI kernels team