PulseAugur
EN
LIVE 12:24:48
ENTITY softmax attention

softmax attention

PulseAugur coverage of softmax attention — every cluster mentioning softmax attention across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
12
12 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
12
12 over 90d
TIER MIX · 90D
TOPICS
RELATIONSHIPS
SENTIMENT · 30D

3 day(s) with sentiment data

RECENT · PAGE 1/1 · 12 TOTAL
  1. RESEARCH · CL_108502 ·

    New EpiKV method optimizes LLM KV cache, boosting efficiency and context length

    A new research paper introduces EpiKV, a method for optimizing KV cache eviction in large language models. Unlike previous methods that rely on attention weights, EpiKV uses an "epiphany score" derived from changes in t…

  2. RESEARCH · CL_109619 ·

    Lifelong AI Learning Needs Parametric Attention in Transformers, Paper Argues

    A new research paper proposes that achieving lifelong continual learning in AI agents necessitates the use of parametric forms of attention within transformer models. The paper argues that the current quadratic complexi…

  3. TOOL · CL_104717 ·

    New research links transformer pathologies to general routing mechanisms

    A new paper from arXiv proposes that common transformer pathologies like attention sinks and representation collapse are not unique to attention mechanisms but are inherent to content-based routing under fixed similarit…

  4. RESEARCH · CL_84359 ·

    Bayesian theory explains emergent copy heads in transformer attention

    Researchers have developed a Bayesian theory to explain the emergence of "copy heads" in transformer attention mechanisms. Their analysis of a single-layer softmax attention network reveals a phase transition in how the…

  5. TOOL · CL_82518 ·

    Blurry Window Attention improves Transformer efficiency for long contexts

    Researchers have introduced Blurry Window Attention (BLA), a novel method designed to improve the efficiency of Transformer language models in handling long contexts. BLA addresses the quadratic complexity and growing K…

  6. TOOL · CL_64777 ·

    Vision Transformers linearized for faster inference with TTT

    Researchers have developed a method to convert pretrained Vision Transformer models into linear-complexity Test-Time Training (TTT) architectures. This approach aligns architectural and representational properties, allo…

  7. RESEARCH · CL_20487 ·

    New research explains how transformers perform in-context learning via gradient descent

    Two new arXiv papers explore the theoretical underpinnings of in-context learning (ICL) in transformers. One paper demonstrates how transformers can perform in-context logistic regression by implicitly executing normali…

  8. RESEARCH · CL_15493 ·

    Linearizing Vision Transformer with Test-Time Training

    Researchers have developed a method to adapt pretrained Softmax attention models to linear-complexity architectures using Test-Time Training (TTT). This approach addresses the representational gap between different atte…

  9. RESEARCH · CL_14475 ·

    Transformers' expressive power explained by new measure-theoretic framework

    Researchers have introduced a new measure-theoretic framework to understand the expressive power of Transformer architectures in modeling contextual relations. This framework connects standard softmax attention to entro…

  10. RESEARCH · CL_11887 ·

    Sigmoid attention improves biological foundation models with faster, stable training

    Researchers have developed a new attention mechanism called Sigmoid Attention, which offers significant improvements for training biological foundation models. This novel approach leads to better learned representations…

  11. RESEARCH · CL_06270 ·

    Kwai Summary Attention compresses historical contexts for efficient long-context LLMs

    Researchers have introduced Kwai Summary Attention (KSA), a novel attention mechanism designed to address the quadratic time complexity of standard softmax attention in large language models. KSA aims to maintain a line…

  12. RESEARCH · CL_05008 ·

    New architectures and frameworks target LLM serving bottlenecks for long contexts

    Researchers have developed novel architectures and techniques to address the escalating latency and energy consumption challenges in serving large language models (LLMs) with long contexts. One approach, AMMA, proposes …