PulseAugur
EN
LIVE 17:33:49
ENTITY Cutlass

Cutlass

PulseAugur coverage of Cutlass — every cluster mentioning Cutlass across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
6
6 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
2
2 over 90d
TIER MIX · 90D
TOPICS
SENTIMENT · 30D

2 day(s) with sentiment data

RECENT · PAGE 1/1 · 6 TOTAL
  1. TOOL · CL_111758 ·

    New system KernelPro autonomously optimizes GPU kernel code using LLMs

    Researchers have developed KernelPro, an autonomous system designed to optimize GPU kernel code for large language models. This system integrates LLM code generation with hardware profiler feedback and specialized analy…

  2. TOOL · CL_75452 ·

    CUDA/C++ inference engine built for NVIDIA's DVLT 3D model

    A new inference engine called dvlt.cu has been developed from scratch using CUDA/C++ for NVIDIA's DVLT 3D transformer model. This standalone 5MB binary has minimal dependencies, relying only on cuBLASLt and the header-o…

  3. TOOL · CL_51969 ·

    TileLang simplifies GPU kernel writing with Python interface

    A new programming language called TileLang aims to simplify GPU kernel development by offering a middle ground between high-level frameworks like Triton and low-level control like CUTLASS. TileLang allows developers to …

  4. RESEARCH · CL_54660 ·

    GPU Matrix Multiplications Faster With Predictable Data

    Researchers have discovered that matrix multiplications on GPUs can perform faster when the input data is "predictable." Initially, a project called CUTLASS showed a 10% performance improvement over NVIDIA's CuBLAS. How…

  5. RESEARCH · CL_13517 ·

    CuTeDSL emerges as new GPU kernel path for LLM inference, challenging CUTLASS

    The landscape of GPU kernel engineering for LLM inference is shifting, with CuTeDSL emerging as a potential successor to C++ CuTe/CUTLASS. This evolution is highlighted by industry trends in technologies like FlashAtten…

  6. RESEARCH · CL_11176 ·

    Moonshot AI open-sources FlashKDA, boosting Kimi Delta Attention 2.5x on H200 GPUs

    Moonshot AI has released FlashKDA, an open-source implementation of Kimi Delta Attention. This new kernel achieves up to 2.5 times faster inference speeds on NVIDIA H200 GPUs. It is built using CUTLASS and optimized for…