softmax attention
PulseAugur coverage of softmax attention — every cluster mentioning softmax attention across labs, papers, and developer communities, ranked by signal.
-
New research explains how transformers perform in-context learning via gradient descent
Two new arXiv papers explore the theoretical underpinnings of in-context learning (ICL) in transformers. One paper demonstrates how transformers can perform in-context logistic regression by implicitly executing normali…
-
Linearizing Vision Transformer with Test-Time Training
Researchers have developed a method to adapt pretrained Softmax attention models to linear-complexity architectures using Test-Time Training (TTT). This approach addresses the representational gap between different atte…
-
Transformers' expressive power explained by new measure-theoretic framework
Researchers have introduced a new measure-theoretic framework to understand the expressive power of Transformer architectures in modeling contextual relations. This framework connects standard softmax attention to entro…
-
Sigmoid attention improves biological foundation models with faster, stable training
Researchers have developed a new attention mechanism called Sigmoid Attention, which offers significant improvements for training biological foundation models. This novel approach leads to better learned representations…
-
Kwai Summary Attention compresses historical contexts for efficient long-context LLMs
Researchers have introduced Kwai Summary Attention (KSA), a novel attention mechanism designed to address the quadratic time complexity of standard softmax attention in large language models. KSA aims to maintain a line…
-
New architectures and frameworks target LLM serving bottlenecks for long contexts
Researchers have developed novel architectures and techniques to address the escalating latency and energy consumption challenges in serving large language models (LLMs) with long contexts. One approach, AMMA, proposes …