Language Modeling
PulseAugur coverage of Language Modeling — every cluster mentioning Language Modeling across labs, papers, and developer communities, ranked by signal.
3 day(s) with sentiment data
-
Flexformer introduces learnable attention kernels for efficient Transformers
Researchers have introduced Flexformer, a novel linear Transformer architecture designed to overcome the quadratic complexity limitations of traditional Transformers. Flexformer achieves this by learning attention kerne…
-
Researchers analyze transformer expressivity using formal grammars
A new research paper analyzes the expressivity of deep transformer models by examining their ability to represent hierarchical structures. The study uses bounded-depth, non-recursive context-free grammars to construct t…
-
Transformer study finds QKV projection sharing slashes memory use
Researchers have investigated the necessity of three distinct projections (query, key, and value) in Transformer models. Their study found that sharing projections, particularly the Q-K=V variant, can significantly redu…
-
Switch Attention dynamically routes between full and sliding window attention
Researchers have introduced Switch Attention (SwiAttn), a novel hybrid transformer architecture designed to address the computational bottleneck of standard full attention mechanisms in long-context language modeling. S…