PulseAugur
EN
LIVE 05:29:35
ENTITY Triton

Triton

PulseAugur coverage of Triton — every cluster mentioning Triton across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
20
20 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
9
9 over 90d
TIER MIX · 90D
TOPICS
SENTIMENT · 30D

8 day(s) with sentiment data

RECENT · PAGE 1/1 · 20 TOTAL
  1. TOOL · CL_111511 ·

    TileMaxSim kernel boosts GPU retrieval model speed by 220x

    Researchers have developed TileMaxSim, a new IO-aware kernel for GPUs designed to significantly accelerate the MaxSim scoring process used in multi-vector retrieval models like ColBERT. Existing implementations are inef…

  2. RESEARCH · CL_104433 ·

    Apache TVM launches TIRx compiler for evolving ML kernels and hardware

    Apache TVM has launched TIRx, an open-source compiler stack designed for machine learning kernels and evolving hardware. This new system allows for hardware-native DSLs and compilation to GPUs and specialized AI acceler…

  3. SIGNIFICANT · CL_96603 ·

    Sunmmio tapes out 3D TokenPU chip, boosting China's AI compute power

    Sunmmio has officially taped out its 3D TokenPU chip, the A4E, designed for large model inference. This marks a significant step for China's domestic AI chip industry, utilizing a 3D hybrid stacking architecture to addr…

  4. TOOL · CL_96296 ·

    AMD User Seeks Triton/Sage Attention Integration for ComfyUI

    A user is seeking assistance with integrating Triton and Sage Attention into ComfyUI on a Windows 11 system with an AMD Radeon 8050S GPU. They are encountering errors related to the 'triton' module not being found, whic…

  5. RESEARCH · CL_93361 ·

    LLMs struggle with GPU kernel generation; new research offers solutions

    Two new research papers explore the challenges of generating correct GPU kernels using large language models (LLMs). The first paper, "The Correctness Illusion in LLM-Generated GPU Kernels," identifies that existing ben…

  6. RESEARCH · CL_93380 ·

    daVinci-kernel uses RL to optimize GPU kernels with evolving skill library

    Researchers have developed daVinci-kernel, a novel reinforcement learning framework designed to optimize GPU kernels. This system co-evolves skill selection, summarization, and utilization, employing three agents that s…

  7. TOOL · CL_91640 ·

    Flash-KMeans accelerates GPU k-means clustering over 200x

    Researchers from UC Berkeley and UT Austin have developed Flash-KMeans, an open-source library that significantly accelerates the k-means clustering algorithm for modern AI pipelines. By optimizing data movement on GPUs…

  8. RESEARCH · CL_81952 ·

    Flash-GMM kernel speeds up GMM clustering 20x, enables larger datasets

    Researchers have developed Flash-GMM, a new fused Triton kernel designed for efficient Gaussian Mixture Model (GMM) computations on GPUs. This kernel significantly reduces memory requirements by avoiding the materializa…

  9. RESEARCH · CL_72140 ·

    Build Your Own LLM Workshop Released on YouTube

    A YouTube workshop is available for individuals interested in building their own large language models without prior math or ML experience. The workshop covers fundamental concepts like neural networks and transformer a…

  10. RESEARCH · CL_63956 ·

    Majestic Labs unveils Prometheus server with 128TB memory

    AI startup Majestic Labs is developing a new server called Prometheus, designed to overcome the limitations of current AI hardware by significantly increasing memory capacity. The server will feature up to 128 terabytes…

  11. TOOL · CL_54717 ·

    Triton MoE kernel achieves high performance on AMD, NVIDIA

    A new fused Mixture-of-Experts (MoE) dispatch kernel, written entirely in Triton, achieves 89-131% of the performance of Stanford's Megablocks library. This kernel notably runs on AMD MI300X hardware without any code mo…

  12. TOOL · CL_51969 ·

    TileLang simplifies GPU kernel writing with Python interface

    A new programming language called TileLang aims to simplify GPU kernel development by offering a middle ground between high-level frameworks like Triton and low-level control like CUTLASS. TileLang allows developers to …

  13. RESEARCH · CL_44358 ·

    Together AI releases FlashAttention-3 and -4 for faster LLM processing

    Together AI has released FlashAttention-3 and FlashAttention-4, significant upgrades to their GPU-accelerated attention mechanism for large language models. FlashAttention-3, designed for Hopper GPUs, achieves up to 75%…

  14. RESEARCH · CL_43418 ·

    Stanford's ThunderKittens DSL optimizes AI kernel performance

    A new article details ThunderKittens, a compact domain-specific language (DSL) developed at Stanford's Hazy Research Lab for creating high-performance AI kernels. The DSL aims to strike a balance between research produc…

  15. RESEARCH · CL_31391 ·

    Moore Threads rallies open-source AI dev community for MUSA GPU ecosystem

    Chinese GPU maker Moore Threads has convened a meetup focused on integrating its MUSA architecture with key open-source large model inference frameworks like SGLang. The event brought together core developers from proje…

  16. RESEARCH · CL_30131 ·

    New framework optimizes LLM inference energy use on multi-GPU systems

    Researchers have developed EnergyLens, a framework designed to optimize the energy consumption of large language models (LLMs) during inference on multi-GPU systems. This tool addresses the challenge of predicting and r…

  17. RESEARCH · CL_20462 ·

    New benchmark reveals LLM-generated GPU kernels struggle with correctness and efficiency

    A new benchmark called KernelBench-X has been developed to evaluate the capabilities of large language models in generating GPU kernels. The benchmark, which covers 176 tasks across 15 categories, reveals that task stru…

  18. RESEARCH · CL_08388 ·

    Triton language now runs efficiently on Huawei Ascend NPUs

    A new compilation framework, Triton-Ascend 3.2.0, has been released to enable the Triton programming language to run efficiently on Huawei's Ascend hardware. This framework simplifies operator development by automating …

  19. SIGNIFICANT · CL_07248 ·

    DeepSeek V4 First Release Adaptation Behind: Why does Ascend insist on not doing a CUDA compatibility layer?

    Huawei's Ascend AI accelerators are forging a unique path by eschewing CUDA compatibility to build an independent ecosystem. This strategy focuses on deep architectural changes in their latest Ascend 950 chips to addres…

  20. RESEARCH · CL_06527 ·

    New methods QFlash and ELSA boost Vision Transformer attention efficiency

    Researchers have developed two new methods to improve the efficiency of attention mechanisms in vision transformers. QFlash focuses on enabling integer-only operations for FlashAttention, achieving significant speedups …