PulseAugur
实时 04:58:28
实体 Triton

Triton

PulseAugur coverage of Triton — every cluster mentioning Triton across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
8
90 天内 8
发布 · 30天
0
90 天内 0
论文 · 30天
4
90 天内 4
层级分布 · 90 天
情绪 · 30 天

3 天有情绪数据

最近 · 第 1/1 页 · 共 8 条
  1. RESEARCH · CL_44358 ·

    Together AI发布FlashAttention-3和-4,加速大语言模型处理

    Together AI发布了FlashAttention-3和FlashAttention-4,这是其用于大语言模型的GPU加速注意力机制的重大升级。FlashAttention-3专为Hopper GPU设计,通过利用张量核心(Tensor Cores)和张量内存加速器(Tensor Memory Accelerator)等新硬件特性并支持FP8精度,实现了高达75%的利用率和比前代产品快1.5-2倍的速度。FlashAttenti…

  2. RESEARCH · CL_43418 ·

    斯坦福大学的ThunderKittens DSL优化AI内核性能

    一篇新文章详细介绍了ThunderKittens,这是斯坦福大学Hazy Research Lab开发的一种紧凑型领域特定语言(DSL),用于创建高性能AI内核。该DSL旨在通过抽象重复的GPU编程任务(如切片布局和内存分配)来平衡研究生产力和硬件效率。这使得开发人员能够密切关注数据移动和调度,同时仍能优化现代AI工作负载在NVIDIA的Hopper和Blackwell等硬件上的性能。

  3. RESEARCH · CL_31391 ·

    Moore Threads rallies open-source AI dev community for MUSA GPU ecosystem

    Chinese GPU maker Moore Threads has convened a meetup focused on integrating its MUSA architecture with key open-source large model inference frameworks like SGLang. The event brought together core developers from proje…

  4. RESEARCH · CL_30131 ·

    New framework optimizes LLM inference energy use on multi-GPU systems

    Researchers have developed EnergyLens, a framework designed to optimize the energy consumption of large language models (LLMs) during inference on multi-GPU systems. This tool addresses the challenge of predicting and r…

  5. RESEARCH · CL_20462 ·

    New benchmark reveals LLM-generated GPU kernels struggle with correctness and efficiency

    A new benchmark called KernelBench-X has been developed to evaluate the capabilities of large language models in generating GPU kernels. The benchmark, which covers 176 tasks across 15 categories, reveals that task stru…

  6. RESEARCH · CL_08388 ·

    Triton language now runs efficiently on Huawei Ascend NPUs

    A new compilation framework, Triton-Ascend 3.2.0, has been released to enable the Triton programming language to run efficiently on Huawei's Ascend hardware. This framework simplifies operator development by automating …

  7. SIGNIFICANT · CL_07248 ·

    DeepSeek V4 First Release Adaptation Behind: Why does Ascend insist on not doing a CUDA compatibility layer?

    Huawei's Ascend AI accelerators are forging a unique path by eschewing CUDA compatibility to build an independent ecosystem. This strategy focuses on deep architectural changes in their latest Ascend 950 chips to addres…

  8. RESEARCH · CL_06527 ·

    New methods QFlash and ELSA boost Vision Transformer attention efficiency

    Researchers have developed two new methods to improve the efficiency of attention mechanisms in vision transformers. QFlash focuses on enabling integer-only operations for FlashAttention, achieving significant speedups …