ENTITY transformers

transformers

PulseAugur coverage of transformers — every cluster mentioning transformers across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

167

167 over 90d

Releases · 30d

0 over 90d

Papers · 30d

118

118 over 90d

TIER MIX · 90D

frontier release 6
significant 6
research 54
tool 93
commentary 8

TOPICS

paper 118
model release 79
other 54
product 47
infra 25
safety 19
opinion 3
policy 1

RELATIONSHIPS

used by KV cache 90%
used by vLLM 70%
used by llama.cpp 70%
used by Ollama 70%
competes with State space models: Univariate representation of a multivariate model, partial interpolation and periodic convergence 70%
used by CNNS 70%
used by AdamW 70%
competes with State Space Models 70%
instance of grokking 70%
used by llama-cpp-python 70%
used by functional magnetic resonance imaging 70%
used by SGD 70%

TIMELINE

2026-05-13 research_milestone A paper was published analyzing the impact of data representation and tokenization on Transformer context effectiveness. source

SENTIMENT · 30D

25 day(s) with sentiment data

RECENT · PAGE 3/9 · 167 TOTAL

RESEARCH · CL_62196 · May 29 · 00:00

LLMs' arithmetic skills boosted by pedagogy and geometric analysis

Researchers are exploring how to improve large language models' (LLMs) arithmetic capabilities through novel training methods and geometric analysis. One approach uses Indonesian mathematics pedagogy to train a small GP…
TOOL · CL_66580 · May 27 · 00:00

AI training framed as Hamilton-Jacobi PDE problem

Researchers have formulated neural network training as a Hamilton-Jacobi initial-value problem. This framework connects gradient steps to solving viscous Hamilton-Jacobi equations, revealing shared mathematical structur…
RESEARCH · CL_64767 · May 26 · 09:09

JetBrains releases Mellum2 reasoning model with 131K context

JetBrains has released its Mellum2 model family, including the Mellum2-12B-A2.5B-Thinking variant, which is designed for complex reasoning tasks. This model utilizes a Mixture-of-Experts architecture with a large contex…
TOOL · CL_51343 · May 26 · 04:00

New Interdomain Attention Merges Transformers and SSMs

Researchers have introduced Interdomain Attention, a novel mechanism that merges the strengths of Transformers and deep state space models (SSMs). This new approach integrates an SSM into an attention module using kerne…
TOOL · CL_51303 · May 26 · 04:00

Transformers learn Spanish morphome differently than humans

Researchers investigated whether transformers can learn the Spanish L-shaped morphome, an irregular morphological pattern, by training models on varying frequencies of irregular verbs. The study found that while transfo…
TOOL · CL_51173 · May 26 · 04:00

Krause Attention improves Transformers with localized interactions

Researchers have introduced Krause Attention, a novel mechanism designed to improve Transformer models by addressing issues like representation collapse and attention sinks. This new approach replaces global aggregation…
RESEARCH · CL_51086 · May 26 · 04:00

Researchers Uncover How Transformers Achieve Analogical Reasoning

Two new research papers explore the mechanisms behind analogical reasoning in Transformer models. The first paper formalizes analogy as inferring correspondences between categories, identifying geometric alignment and f…
TOOL · CL_50923 · May 26 · 04:00

New method identifies attention-head circuits in transformers

Researchers have developed a novel three-step method called Spectral Probe-Circuits to identify specific computational circuits within pretrained transformer models. This technique uses a spectral signal to rank attenti…
TOOL · CL_60797 · May 25 · 19:37

Deep Learning Models Compared for Skin Cancer Detection

Researchers have conducted a comprehensive evaluation of twelve deep learning models for detecting skin cancer using a unified approach on the PAD-UFES-20 dataset. The study compared convolutional neural networks (CNNs)…
TOOL · CL_48970 · May 25 · 04:00

NextLat Transformers Learn Compact World Models for Better Generalization

Researchers have developed a new training method called Next-Latent Prediction (NextLat) for transformers, which encourages them to build more compact internal world models. This approach adds a self-supervised objectiv…
TOOL · CL_48893 · May 25 · 04:00

Certification Hard for Transformers and Circuits

A new research paper explores the difficulty of certifying the exact behavior of neural networks, particularly Transformers and circuits, even with minimal overparametrization. The study demonstrates that adding even a …
TOOL · CL_48721 · May 25 · 04:00

Tensor Cache enhances Transformer long-context memory

Researchers have developed a novel memory system called Tensor Cache for Transformers, designed to enhance their ability to handle long contexts. This system combines a sliding-window cache with a second-level fast-weig…
FRONTIER RELEASE · CL_58091 · May 23 · 02:13

Stepfun AI releases 198B parameter multimodal MoE model

Stepfun AI has released Step 3.7 Flash, a 198-billion parameter sparse Mixture-of-Experts (MoE) vision-language model. This model is optimized for agentic workflows, coding, and multimodal tasks, activating approximatel…
FRONTIER RELEASE · CL_69322 · May 23 · 01:17

Google DeepMind releases multimodal Gemma 4 12B models

Google DeepMind has released several variants of its Gemma 4 models, including the 12B parameter versions. These models are multimodal, capable of processing text, image, audio, and video inputs, with a focus on efficie…
TOOL · CL_45371 · May 23 · 00:55

Fixing local LLM OOM errors by optimizing KV cache and quantization

Running large open-source language models locally can lead to out-of-memory errors, even if the model's weights seem to fit within the available VRAM. This is primarily due to the significant memory required for the KV …
TOOL · CL_43454 · May 22 · 05:47

New CODA paper reframes Transformers as math problems

A new research paper introduces CODA, a novel approach to Transformers that reframes them as mathematical problems. This method aims to potentially revolutionize the architecture of neural networks. The paper is availab…
TOOL · CL_44898 · May 22 · 04:00

Transformers struggle with state-based decisions in search, new paper finds

Researchers have identified a critical limitation in how transformer models process serialized trajectory data during backtracking search. These models can struggle with 'scattered retrieval,' where state features are d…
TOOL · CL_44709 · May 22 · 04:00

LLM Pretraining Creates Generalizable Manifold for Time Series Forecasting

A new research paper explores how large language models (LLMs) pretrained on text can be effectively used for time-series forecasting. The study demonstrates that language pretraining equips transformers with a reusable…
RESEARCH · CL_48751 · May 22 · 00:00

LLMs and new frameworks boost GPU kernel optimization

Researchers are exploring novel ways to optimize GPU kernel performance for large language models. One approach uses language models as surrogates to predict kernel performance, significantly increasing the number of ca…
RESEARCH · CL_44050 · May 21 · 13:32

Paper reveals graph tokenization trade-offs for Transformer expressivity

A new paper explores the critical role of graph tokenization in applying Transformers to graph learning tasks. Researchers demonstrate that the method used to convert graph structures into tokens significantly impacts a…

LLMs' arithmetic skills boosted by pedagogy and geometric analysis

AI training framed as Hamilton-Jacobi PDE problem

JetBrains releases Mellum2 reasoning model with 131K context

New Interdomain Attention Merges Transformers and SSMs

Transformers learn Spanish morphome differently than humans

Krause Attention improves Transformers with localized interactions

Researchers Uncover How Transformers Achieve Analogical Reasoning

New method identifies attention-head circuits in transformers

Deep Learning Models Compared for Skin Cancer Detection

NextLat Transformers Learn Compact World Models for Better Generalization

Certification Hard for Transformers and Circuits

Tensor Cache enhances Transformer long-context memory

Stepfun AI releases 198B parameter multimodal MoE model

Google DeepMind releases multimodal Gemma 4 12B models

Fixing local LLM OOM errors by optimizing KV cache and quantization

New CODA paper reframes Transformers as math problems

Transformers struggle with state-based decisions in search, new paper finds

LLM Pretraining Creates Generalizable Manifold for Time Series Forecasting

LLMs and new frameworks boost GPU kernel optimization

Paper reveals graph tokenization trade-offs for Transformer expressivity