ENTITY Cutlass

Cutlass

PulseAugur coverage of Cutlass — every cluster mentioning Cutlass across labs, papers, and developer communities, ranked by signal.

Total · 30d

6

6 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

2

2 over 90d

TIER MIX · 90D

TOPICS

SENTIMENT · 30D

2 day(s) with sentiment data

RECENT · PAGE 1/1 · 6 TOTAL

TOOL · CL_111758 · Jun 26 · 04:00

New system KernelPro autonomously optimizes GPU kernel code using LLMs

Researchers have developed KernelPro, an autonomous system designed to optimize GPU kernel code for large language models. This system integrates LLM code generation with hardware profiler feedback and specialized analy…
TOOL · CL_75452 · Jun 6 · 22:04

CUDA/C++ inference engine built for NVIDIA's DVLT 3D model

A new inference engine called dvlt.cu has been developed from scratch using CUDA/C++ for NVIDIA's DVLT 3D transformer model. This standalone 5MB binary has minimal dependencies, relying only on cuBLASLt and the header-o…
TOOL · CL_51969 · May 26 · 08:50

TileLang simplifies GPU kernel writing with Python interface

A new programming language called TileLang aims to simplify GPU kernel development by offering a middle ground between high-level frameworks like Triton and low-level control like CUTLASS. TileLang allows developers to …
RESEARCH · CL_54660 · May 23 · 12:11

GPU Matrix Multiplications Faster With Predictable Data

Researchers have discovered that matrix multiplications on GPUs can perform faster when the input data is "predictable." Initially, a project called CUTLASS showed a 10% performance improvement over NVIDIA's CuBLAS. How…
RESEARCH · CL_13517 · May 3 · 08:26

CuTeDSL emerges as new GPU kernel path for LLM inference, challenging CUTLASS

The landscape of GPU kernel engineering for LLM inference is shifting, with CuTeDSL emerging as a potential successor to C++ CuTe/CUTLASS. This evolution is highlighted by industry trends in technologies like FlashAtten…
RESEARCH · CL_11176 · May 1 · 01:38

Moonshot AI open-sources FlashKDA, boosting Kimi Delta Attention 2.5x on H200 GPUs

Moonshot AI has released FlashKDA, an open-source implementation of Kimi Delta Attention. This new kernel achieves up to 2.5 times faster inference speeds on NVIDIA H200 GPUs. It is built using CUTLASS and optimized for…