PulseAugur
EN
LIVE 08:54:51
ENTITY Flashattention

Flashattention

PulseAugur coverage of Flashattention — every cluster mentioning Flashattention across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
26
26 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
19
19 over 90d
TIER MIX · 90D
TOPICS
SENTIMENT · 30D

6 day(s) with sentiment data

RECENT · PAGE 1/2 · 26 TOTAL
  1. RESEARCH · CL_108502 ·

    New EpiKV method optimizes LLM KV cache, boosting efficiency and context length

    A new research paper introduces EpiKV, a method for optimizing KV cache eviction in large language models. Unlike previous methods that rely on attention weights, EpiKV uses an "epiphany score" derived from changes in t…

  2. RESEARCH · CL_112712 ·

    New book details modern GPU programming for AI workloads

    A new book titled "Modern GPU Programming for MLSys" aims to demystify high-performance GPU kernel development for machine learning systems. The book, originating from Carnegie Mellon University's Machine Learning Syste…

  3. TOOL · CL_101987 ·

    Free 15-part series explains LLM internals with Gemma 4 12B

    A 15-part series delves into the inner workings of Large Language Models, using Gemma 4 12B as a practical example. The series covers topics from tokenization and tensor shapes to inference, memory constraints, and fine…

  4. SIGNIFICANT · CL_101878 ·

    Subquadratic unveils SubQ LLM for single-pass codebase processing

    Subquadratic Inc. has unveiled SubQ, a new long-context language model that claims to process entire codebases or document sets in a single pass. The model utilizes a subquadratic, sparse-attention design, which theoret…

  5. TOOL · CL_91640 ·

    Flash-KMeans accelerates GPU k-means clustering over 200x

    Researchers from UC Berkeley and UT Austin have developed Flash-KMeans, an open-source library that significantly accelerates the k-means clustering algorithm for modern AI pipelines. By optimizing data movement on GPUs…

  6. TOOL · CL_79834 ·

    Math framework slashes transformer memory use, boosts speed

    Researchers have developed a new framework called Mathematics of Arrays (MoA) to optimize transformer kernels, which are computationally intensive components of modern AI models. This framework uses algebraic constructi…

  7. TOOL · CL_62826 ·

    New bias method enables faster, scalable Super-Resolution Transformers

    Researchers have developed a new method called Rank-factorized Implicit Neural Bias (RIB) to improve the efficiency of Super-Resolution Transformers. This technique allows these models to utilize hardware-accelerated ke…

  8. RESEARCH · CL_55666 ·

    OSP-Next video model achieves 83.73% VBench score with efficiency gains

    Researchers have introduced OSP-Next, a novel text-to-video generation model designed for enhanced efficiency and quality. The model integrates sparse attention mechanisms, a novel Sparse Sequence Parallelism (SSP) tech…

  9. RESEARCH · CL_48931 ·

    New technique slashes I/O costs for LLM attention mechanisms

    Researchers have developed a new technique to significantly reduce the I/O complexity of attention mechanisms in large language models. This method aims to minimize data transfers between fast and slow memory, a critica…

  10. RESEARCH · CL_35013 ·

    Nous Research's Lighthouse Attention speeds up LLM pretraining

    Researchers at Nous Research have developed Lighthouse Attention, a novel hierarchical attention mechanism designed to accelerate the pretraining of large language models with long contexts. This method achieves a 1.4x …

  11. RESEARCH · CL_44749 ·

    New research tackles attention mechanism limitations in transformers

    Researchers are exploring novel approaches to enhance the efficiency and effectiveness of attention mechanisms in transformers. Several papers introduce methods to mitigate issues like over-smoothing and computational b…

  12. RESEARCH · CL_36554 ·

    New research tackles diffusion language model limitations

    Researchers are exploring new methods to improve diffusion language models (DLMs), which offer faster inference than autoregressive models. Several recent papers introduce techniques to enhance DLM performance, includin…

  13. TOOL · CL_31732 ·

    Guide details building FlashAttention wheel file for ML integration

    This article provides a guide on how to build and install version 2.8.3 of FlashAttention. It focuses on the technical process of creating a wheel file, which is a standard distribution format for Python packages. The g…

  14. RESEARCH · CL_34499 ·

    New attention methods tackle LLM long-context challenges

    Researchers are developing new attention mechanisms to handle increasingly long contexts in large language models. One approach, Runtime-Certified Bounded-Error Quantized Attention, uses tiered KV caches to compress mem…

  15. RESEARCH · CL_14450 ·

    Researchers explore novel attention mechanisms and optimization techniques for LLMs

    Researchers are exploring novel attention mechanisms to overcome the quadratic complexity of standard self-attention in transformers, particularly for long-context processing. Several papers introduce methods like Light…

  16. RESEARCH · CL_10154 ·

    OVGGT achieves constant-cost streaming for 3D geometry reconstruction

    Researchers have introduced OVGGT, a novel framework designed for reconstructing 3D geometry from streaming video with constant memory and compute costs. This training-free approach addresses the limitations of previous…

  17. RESEARCH · CL_10106 ·

    Focus method enhances LLM attention efficiency without performance loss

    Researchers have developed a new method called Focus, designed to improve the efficiency of attention mechanisms in large language models. Standard attention scales quadratically with sequence length, leading to high co…

  18. RESEARCH · CL_06527 ·

    New methods QFlash and ELSA boost Vision Transformer attention efficiency

    Researchers have developed two new methods to improve the efficiency of attention mechanisms in vision transformers. QFlash focuses on enabling integer-only operations for FlashAttention, achieving significant speedups …

  19. SIGNIFICANT · CL_05912 ·

    Together AI powers national scientific mission with open-source infrastructure

    Together, an open-source AI lab, has announced its participation in the Genesis Mission, a project aimed at doubling American scientific productivity over the next decade. The initiative connects supercomputers, experim…

  20. TOOL · CL_47658 ·

    Together AI kernels team optimizes GPUs with FlashAttention

    The Together AI kernels team, including researchers Dan Fu and Tri Dao, developed FlashAttention, a software layer that significantly optimizes GPU performance for AI models. This breakthrough, achieved by applying data…