transformer language models
PulseAugur coverage of transformer language models — every cluster mentioning transformer language models across labs, papers, and developer communities, ranked by signal.
5 day(s) with sentiment data
-
New research tracks mentalizing and situation modeling in Transformer language models
A new research paper explores the development of situation modeling and mentalizing capabilities in Transformer language models, specifically the Olmo2 and Pythia suites. The study found that accurate performance on fal…
-
Research links emergent AI capabilities to learning sparse attention patterns
A new research paper proposes that emergent capabilities in transformer language models arise randomly from the learning of sparse attention patterns. The study demonstrates that these capabilities, such as pattern comp…
-
Energy-based transformers show promise in predicting reading difficulty
Researchers have introduced a new class of transformer models called energy-based transformers, which offer a formal connection to associative memory models. In computational psycholinguistics, this energy measure has b…
-
Persistent homology tracks LLM representation changes during fine-tuning
Researchers have employed persistent homology to analyze the internal representation dynamics of large language models during supervised fine-tuning. Their study, which examined four transformer models (1B to 7B paramet…
-
New framework reveals geometric limits on transformer model feature representation
Researchers have developed a new framework to understand the geometric limits of feature representation in transformer language models. By analyzing the embedding matrix and its deviation from near-orthogonality, they i…
-
Research reveals LLMs retain hidden concepts despite suppression
A new research paper explores the effectiveness of instruction-based suppression in large language models, finding that while models can be trained to avoid expressing prohibited content, the underlying concepts remain …
-
New GiLT model uses dependency graphs to boost Transformer language models
Researchers have developed GiLT, a new Transformer language model that incorporates dependency graphs to enhance syntactic generalization. Unlike previous methods that add structural tokens, GiLT integrates linguistic i…
-
Stochastic KV Routing enables adaptive depth-wise cache sharing for LLMs
Researchers have developed a new method called Stochastic KV Routing to reduce the memory footprint of transformer language models. This technique enables adaptive depth-wise cache sharing by training layers to randomly…