CuTe-DSL
PulseAugur coverage of CuTe-DSL — every cluster mentioning CuTe-DSL across labs, papers, and developer communities, ranked by signal.
2 day(s) with sentiment data
-
GB200 NVL72 serving costs slashed 2.5x via software upgrades
Software optimizations for the GB200 NVL72 have drastically reduced serving costs by 2.5 times in under 70 days. These improvements, particularly the rewriting of the NVFP4 MoE kernel using CuTe-DSL and leveraging the N…
-
Modal optimizes FlashAttention-4 for faster LLM inference
Modal has enhanced the FlashAttention-4 kernel to improve inference speed for large language models, particularly for decode-heavy workloads. Their contributions focused on adjusting parallelism strategies, such as shif…
-
NVIDIA open-sources cuDNN kernels after 12 years, including MoE and sparse attention
NVIDIA has open-sourced parts of its cuDNN library, a significant move after 12 years of it being closed-source. This release includes over 20 Mixture-of-Experts (MoE) kernels and NSA sparse attention kernels. The codeb…