ENTITY transformers

transformers

PulseAugur coverage of transformers — every cluster mentioning transformers across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

183

183 over 90d

Releases · 30d

0 over 90d

Papers · 30d

123

123 over 90d

TIER MIX · 90D

frontier release 7
significant 6
research 62
tool 98
commentary 10

TOPICS

paper 123
model release 91
other 58
product 55
infra 25
safety 18
opinion 5
policy 1

RELATIONSHIPS

used by KV cache 90%
used by vLLM 70%
used by llama.cpp 70%
used by Ollama 70%
competes with CNNS 70%
used by Unsloth 70%
competes with State space models: Univariate representation of a multivariate model, partial interpolation and periodic convergence 70%
used by AdamW 70%
instance of grokking 70%
used by llama-cpp-python 70%
used by functional magnetic resonance imaging 70%
developed by KV cache 70%

TIMELINE

2026-05-13 research_milestone A paper was published analyzing the impact of data representation and tokenization on Transformer context effectiveness. source

SENTIMENT · 30D

25 day(s) with sentiment data

RECENT · PAGE 4/10 · 183 TOTAL

RESEARCH · CL_51086 · May 26 · 04:00

Researchers Uncover How Transformers Achieve Analogical Reasoning

Two new research papers explore the mechanisms behind analogical reasoning in Transformer models. The first paper formalizes analogy as inferring correspondences between categories, identifying geometric alignment and f…
TOOL · CL_50923 · May 26 · 04:00

New method identifies attention-head circuits in transformers

Researchers have developed a novel three-step method called Spectral Probe-Circuits to identify specific computational circuits within pretrained transformer models. This technique uses a spectral signal to rank attenti…
TOOL · CL_60797 · May 25 · 19:37

Deep Learning Models Compared for Skin Cancer Detection

Researchers have conducted a comprehensive evaluation of twelve deep learning models for detecting skin cancer using a unified approach on the PAD-UFES-20 dataset. The study compared convolutional neural networks (CNNs)…
TOOL · CL_48970 · May 25 · 04:00

NextLat Transformers Learn Compact World Models for Better Generalization

Researchers have developed a new training method called Next-Latent Prediction (NextLat) for transformers, which encourages them to build more compact internal world models. This approach adds a self-supervised objectiv…
TOOL · CL_48893 · May 25 · 04:00

Certification Hard for Transformers and Circuits

A new research paper explores the difficulty of certifying the exact behavior of neural networks, particularly Transformers and circuits, even with minimal overparametrization. The study demonstrates that adding even a …
TOOL · CL_48721 · May 25 · 04:00

Tensor Cache enhances Transformer long-context memory

Researchers have developed a novel memory system called Tensor Cache for Transformers, designed to enhance their ability to handle long contexts. This system combines a sliding-window cache with a second-level fast-weig…
FRONTIER RELEASE · CL_58091 · May 23 · 02:13

Stepfun AI releases 198B parameter multimodal MoE model

Stepfun AI has released Step 3.7 Flash, a 198-billion parameter sparse Mixture-of-Experts (MoE) vision-language model. This model is optimized for agentic workflows, coding, and multimodal tasks, activating approximatel…
FRONTIER RELEASE · CL_69322 · May 23 · 01:17

Google DeepMind releases multimodal Gemma 4 12B models

Google DeepMind has released several variants of its Gemma 4 models, including the 12B parameter versions. These models are multimodal, capable of processing text, image, audio, and video inputs, with a focus on efficie…
TOOL · CL_45371 · May 23 · 00:55

Fixing local LLM OOM errors by optimizing KV cache and quantization

Running large open-source language models locally can lead to out-of-memory errors, even if the model's weights seem to fit within the available VRAM. This is primarily due to the significant memory required for the KV …
TOOL · CL_43454 · May 22 · 05:47

New CODA paper reframes Transformers as math problems

A new research paper introduces CODA, a novel approach to Transformers that reframes them as mathematical problems. This method aims to potentially revolutionize the architecture of neural networks. The paper is availab…
TOOL · CL_44898 · May 22 · 04:00

Transformers struggle with state-based decisions in search, new paper finds

Researchers have identified a critical limitation in how transformer models process serialized trajectory data during backtracking search. These models can struggle with 'scattered retrieval,' where state features are d…
TOOL · CL_44709 · May 22 · 04:00

LLM Pretraining Creates Generalizable Manifold for Time Series Forecasting

A new research paper explores how large language models (LLMs) pretrained on text can be effectively used for time-series forecasting. The study demonstrates that language pretraining equips transformers with a reusable…
RESEARCH · CL_48751 · May 22 · 00:00

LLMs and new frameworks boost GPU kernel optimization

Researchers are exploring novel ways to optimize GPU kernel performance for large language models. One approach uses language models as surrogates to predict kernel performance, significantly increasing the number of ca…
RESEARCH · CL_44050 · May 21 · 13:32

Paper reveals graph tokenization trade-offs for Transformer expressivity

A new paper explores the critical role of graph tokenization in applying Transformers to graph learning tasks. Researchers demonstrate that the method used to convert graph structures into tokens significantly impacts a…
SIGNIFICANT · CL_49676 · May 21 · 07:27

OpenBMB releases MiniCPM5-1B for on-device AI tasks

OpenBMB has released MiniCPM5-1B, a 1-billion parameter Transformer model designed for on-device and resource-constrained environments. This model claims state-of-the-art performance within its size class, particularly …
TOOL · CL_69323 · May 21 · 04:15

Hugging Face releases Qwen/Qwen-Image-Bench multimodal model

Hugging Face has released Qwen/Qwen-Image-Bench, a new multimodal model capable of processing both text and images. The model is accessible through various libraries and tools, including Transformers, vLLM, and SGLang. …
RESEARCH · CL_42474 · May 20 · 15:36

Deformba method enhances State Space Models for vision tasks

Researchers have introduced Deformba, a novel context-adaptive method designed to enhance the application of State Space Models (SSMs) to vision tasks. Deformba addresses limitations in existing vision SSMs by dynamical…
SIGNIFICANT · CL_44550 · May 20 · 15:29

Cohere releases open-source Command A+ AI model for enterprise agents

Cohere has released Command A+, an open-source, multimodal AI model designed for enterprise use and agentic tasks. This new model integrates reasoning, vision, and multilingual capabilities, supporting 48 languages and …
TOOL · CL_41851 · May 20 · 12:34

New HORST optimizer enhances sparse transformer training

Researchers have developed HORST, a novel optimizer designed to improve the training of sparse transformers. Standard optimizers struggle to balance the need for sparsity with training stability. HORST addresses this by…
RESEARCH · CL_41758 · May 20 · 10:23

New theory explains transformer generalization via Fourier Spectra

Researchers have developed a new theoretical framework to understand how transformers generalize, focusing on the Fourier Spectra of their target functions. This approach utilizes PAC-Bayes theory to derive generalizati…

Researchers Uncover How Transformers Achieve Analogical Reasoning

New method identifies attention-head circuits in transformers

Deep Learning Models Compared for Skin Cancer Detection

NextLat Transformers Learn Compact World Models for Better Generalization

Certification Hard for Transformers and Circuits

Tensor Cache enhances Transformer long-context memory

Stepfun AI releases 198B parameter multimodal MoE model

Google DeepMind releases multimodal Gemma 4 12B models

Fixing local LLM OOM errors by optimizing KV cache and quantization

New CODA paper reframes Transformers as math problems

Transformers struggle with state-based decisions in search, new paper finds

LLM Pretraining Creates Generalizable Manifold for Time Series Forecasting

LLMs and new frameworks boost GPU kernel optimization

Paper reveals graph tokenization trade-offs for Transformer expressivity

OpenBMB releases MiniCPM5-1B for on-device AI tasks

Hugging Face releases Qwen/Qwen-Image-Bench multimodal model

Deformba method enhances State Space Models for vision tasks

Cohere releases open-source Command A+ AI model for enterprise agents

New HORST optimizer enhances sparse transformer training

New theory explains transformer generalization via Fourier Spectra