ENTITY softmax attention

softmax attention

PulseAugur coverage of softmax attention — every cluster mentioning softmax attention across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

12 over 90d

Releases · 30d

0 over 90d

Papers · 30d

12 over 90d

TIER MIX · 90D

TOPICS

RELATIONSHIPS

SENTIMENT · 30D

3 day(s) with sentiment data

RECENT · PAGE 1/1 · 12 TOTAL

RESEARCH · CL_108502 · Jun 24 · 10:18

New EpiKV method optimizes LLM KV cache, boosting efficiency and context length

A new research paper introduces EpiKV, a method for optimizing KV cache eviction in large language models. Unlike previous methods that rely on attention weights, EpiKV uses an "epiphany score" derived from changes in t…
RESEARCH · CL_109619 · Jun 24 · 03:14

Lifelong AI Learning Needs Parametric Attention in Transformers, Paper Argues

A new research paper proposes that achieving lifelong continual learning in AI agents necessitates the use of parametric forms of attention within transformer models. The paper argues that the current quadratic complexi…
TOOL · CL_104717 · Jun 21 · 03:59

New research links transformer pathologies to general routing mechanisms

A new paper from arXiv proposes that common transformer pathologies like attention sinks and representation collapse are not unique to attention mechanisms but are inherent to content-based routing under fixed similarit…
RESEARCH · CL_84359 · Jun 10 · 13:26

Bayesian theory explains emergent copy heads in transformer attention

Researchers have developed a Bayesian theory to explain the emergence of "copy heads" in transformer attention mechanisms. Their analysis of a single-layer softmax attention network reveals a phase transition in how the…
TOOL · CL_82518 · Jun 10 · 04:00

Blurry Window Attention improves Transformer efficiency for long contexts

Researchers have introduced Blurry Window Attention (BLA), a novel method designed to improve the efficiency of Transformer language models in handling long contexts. BLA addresses the quadratic complexity and growing K…
TOOL · CL_64777 · May 28 · 00:00

Vision Transformers linearized for faster inference with TTT

Researchers have developed a method to convert pretrained Vision Transformer models into linear-complexity Test-Time Training (TTT) architectures. This approach aligns architectural and representational properties, allo…
RESEARCH · CL_20487 · May 6 · 17:42

New research explains how transformers perform in-context learning via gradient descent

Two new arXiv papers explore the theoretical underpinnings of in-context learning (ICL) in transformers. One paper demonstrates how transformers can perform in-context logistic regression by implicitly executing normali…
RESEARCH · CL_15493 · May 4 · 16:16

Linearizing Vision Transformer with Test-Time Training

Researchers have developed a method to adapt pretrained Softmax attention models to linear-complexity architectures using Test-Time Training (TTT). This approach addresses the representational gap between different atte…
RESEARCH · CL_14475 · May 4 · 04:00

Transformers' expressive power explained by new measure-theoretic framework

Researchers have introduced a new measure-theoretic framework to understand the expressive power of Transformer architectures in modeling contextual relations. This framework connects standard softmax attention to entro…
RESEARCH · CL_11887 · May 1 · 04:00

Sigmoid attention improves biological foundation models with faster, stable training

Researchers have developed a new attention mechanism called Sigmoid Attention, which offers significant improvements for training biological foundation models. This novel approach leads to better learned representations…
RESEARCH · CL_06270 · Apr 27 · 12:59

Kwai Summary Attention compresses historical contexts for efficient long-context LLMs

Researchers have introduced Kwai Summary Attention (KSA), a novel attention mechanism designed to address the quadratic time complexity of standard softmax attention in large language models. KSA aims to maintain a line…
RESEARCH · CL_05008 · Apr 23 · 20:12

New architectures and frameworks target LLM serving bottlenecks for long contexts

Researchers have developed novel architectures and techniques to address the escalating latency and energy consumption challenges in serving large language models (LLMs) with long contexts. One approach, AMMA, proposes …

New EpiKV method optimizes LLM KV cache, boosting efficiency and context length

Lifelong AI Learning Needs Parametric Attention in Transformers, Paper Argues

New research links transformer pathologies to general routing mechanisms

Bayesian theory explains emergent copy heads in transformer attention

Blurry Window Attention improves Transformer efficiency for long contexts

Vision Transformers linearized for faster inference with TTT

New research explains how transformers perform in-context learning via gradient descent

Linearizing Vision Transformer with Test-Time Training

Transformers' expressive power explained by new measure-theoretic framework

Sigmoid attention improves biological foundation models with faster, stable training

Kwai Summary Attention compresses historical contexts for efficient long-context LLMs

New architectures and frameworks target LLM serving bottlenecks for long contexts