softmax attention
PulseAugur coverage of softmax attention — every cluster mentioning softmax attention across labs, papers, and developer communities, ranked by signal.
-
新研究解释了Transformer如何通过梯度下降进行上下文内学习
两篇新的arXiv论文探讨了Transformer中上下文内学习(ICL)的理论基础。一篇论文展示了Transformer如何通过在每一层内隐式执行归一化梯度下降步骤来执行上下文内逻辑回归。另一篇论文研究了非线性回归,展示了注意力机制如何构建特征,使Transformer能够在不更新权重的情况下从示例中学习。
-
Linearizing Vision Transformer with Test-Time Training
Researchers have developed a method to adapt pretrained Softmax attention models to linear-complexity architectures using Test-Time Training (TTT). This approach addresses the representational gap between different atte…
-
Transformers' expressive power explained by new measure-theoretic framework
Researchers have introduced a new measure-theoretic framework to understand the expressive power of Transformer architectures in modeling contextual relations. This framework connects standard softmax attention to entro…
-
Sigmoid attention improves biological foundation models with faster, stable training
Researchers have developed a new attention mechanism called Sigmoid Attention, which offers significant improvements for training biological foundation models. This novel approach leads to better learned representations…
-
Kwai Summary Attention compresses historical contexts for efficient long-context LLMs
Researchers have introduced Kwai Summary Attention (KSA), a novel attention mechanism designed to address the quadratic time complexity of standard softmax attention in large language models. KSA aims to maintain a line…
-
New architectures and frameworks target LLM serving bottlenecks for long contexts
Researchers have developed novel architectures and techniques to address the escalating latency and energy consumption challenges in serving large language models (LLMs) with long contexts. One approach, AMMA, proposes …