ENTITY linear attention

linear attention

PulseAugur coverage of linear attention — every cluster mentioning linear attention across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

12 over 90d

Releases · 30d

0 over 90d

Papers · 30d

12 over 90d

TIER MIX · 90D

TOPICS

RELATIONSHIPS

SENTIMENT · 30D

5 day(s) with sentiment data

RECENT · PAGE 1/1 · 12 TOTAL

RESEARCH · CL_109619 · Jun 24 · 03:14

Lifelong AI Learning Needs Parametric Attention in Transformers, Paper Argues

A new research paper proposes that achieving lifelong continual learning in AI agents necessitates the use of parametric forms of attention within transformer models. The paper argues that the current quadratic complexi…
RESEARCH · CL_103889 · Jun 18 · 00:00

HydraHead architecture fuses attention types for improved long-context LLMs

Researchers have introduced HydraHead, a novel architecture that hybridizes Full Attention and Linear Attention at the head level within transformer models. This approach leverages interpretability to identify critical …
RESEARCH · CL_93108 · Jun 15 · 00:00

New research explores hybrid and sparse attention mechanisms for LLMs

Researchers are exploring novel methods to optimize attention mechanisms in large language models, particularly for handling long contexts. The HydraHead architecture, for instance, hybridizes Full Attention (FA) and Li…
RESEARCH · CL_84359 · Jun 10 · 13:26

Bayesian theory explains emergent copy heads in transformer attention

Researchers have developed a Bayesian theory to explain the emergence of "copy heads" in transformer attention mechanisms. Their analysis of a single-layer softmax attention network reveals a phase transition in how the…
TOOL · CL_82518 · Jun 10 · 04:00

Blurry Window Attention improves Transformer efficiency for long contexts

Researchers have introduced Blurry Window Attention (BLA), a novel method designed to improve the efficiency of Transformer language models in handling long contexts. BLA addresses the quadratic complexity and growing K…
RESEARCH · CL_77141 · Jun 5 · 01:35

New model explains how training diversity boosts transformer in-context learning

Researchers have developed an analytical model to explain how training task diversity influences in-context learning (ICL) in transformers. The model, which treats training task vectors as low-rank Gaussians, demonstrat…
RESEARCH · CL_62204 · May 29 · 11:13

New framework unifies sequence models using Bayesian memory

Researchers have introduced a "design-model" framework for creating efficient recurrent sequence maps based on memory assumptions. This framework uses Bayesian filtering to write evidence into memory and a query-depende…
RESEARCH · CL_43909 · May 21 · 17:44

NVIDIA unveils Gated DeltaNet-2 for improved linear attention

NVIDIA has introduced Gated DeltaNet-2, a new linear attention layer designed to improve memory editing in recurrent neural networks. This model separates the processes of erasing old information and writing new informa…
TOOL · CL_30774 · May 13 · 12:59

OSDN improves linear attention with online preconditioning

Researchers have introduced OSDN, a novel method that enhances linear attention mechanisms by incorporating provable online preconditioning. This technique augments the Delta Rule with a diagonal preconditioner, which i…
RESEARCH · CL_34499 · May 11 · 20:03

New attention methods tackle LLM long-context challenges

Researchers are developing new attention mechanisms to handle increasingly long contexts in large language models. One approach, Runtime-Certified Bounded-Error Quantized Attention, uses tiered KV caches to compress mem…
TOOL · CL_25583 · May 8 · 13:59

Recurrent models fail at state tracking due to error dynamics

Researchers have introduced a new perspective on state tracking within recurrent neural network architectures, emphasizing error control dynamics over theoretical expressive capacity. They demonstrate that affine recurr…
RESEARCH · CL_05127 · Apr 27 · 04:00

StateX framework boosts RNN recall by expanding model states post-training

Researchers have developed StateX, a post-training framework designed to improve the recall capabilities of recurrent neural networks (RNNs). This method efficiently expands the states of pre-trained RNNs, such as linea…

Lifelong AI Learning Needs Parametric Attention in Transformers, Paper Argues

HydraHead architecture fuses attention types for improved long-context LLMs

New research explores hybrid and sparse attention mechanisms for LLMs

Bayesian theory explains emergent copy heads in transformer attention

Blurry Window Attention improves Transformer efficiency for long contexts

New model explains how training diversity boosts transformer in-context learning

New framework unifies sequence models using Bayesian memory

NVIDIA unveils Gated DeltaNet-2 for improved linear attention

OSDN improves linear attention with online preconditioning

New attention methods tackle LLM long-context challenges

Recurrent models fail at state tracking due to error dynamics

StateX framework boosts RNN recall by expanding model states post-training