transformers
PulseAugur coverage of transformers — every cluster mentioning transformers across labs, papers, and developer communities, ranked by signal.
- used by KV cache 90%
- used by vLLM 70%
- used by llama.cpp 70%
- used by Ollama 70%
- competes with CNNS 70%
- used by Unsloth 70%
- competes with State space models: Univariate representation of a multivariate model, partial interpolation and periodic convergence 70%
- used by AdamW 70%
- instance of grokking 70%
- used by llama-cpp-python 70%
- used by functional magnetic resonance imaging 70%
- developed by KV cache 70%
- 2026-05-13 research_milestone A paper was published analyzing the impact of data representation and tokenization on Transformer context effectiveness. source
25 day(s) with sentiment data
-
Researchers Uncover How Transformers Achieve Analogical Reasoning
Two new research papers explore the mechanisms behind analogical reasoning in Transformer models. The first paper formalizes analogy as inferring correspondences between categories, identifying geometric alignment and f…
-
New method identifies attention-head circuits in transformers
Researchers have developed a novel three-step method called Spectral Probe-Circuits to identify specific computational circuits within pretrained transformer models. This technique uses a spectral signal to rank attenti…
-
Deep Learning Models Compared for Skin Cancer Detection
Researchers have conducted a comprehensive evaluation of twelve deep learning models for detecting skin cancer using a unified approach on the PAD-UFES-20 dataset. The study compared convolutional neural networks (CNNs)…
-
NextLat Transformers Learn Compact World Models for Better Generalization
Researchers have developed a new training method called Next-Latent Prediction (NextLat) for transformers, which encourages them to build more compact internal world models. This approach adds a self-supervised objectiv…
-
Certification Hard for Transformers and Circuits
A new research paper explores the difficulty of certifying the exact behavior of neural networks, particularly Transformers and circuits, even with minimal overparametrization. The study demonstrates that adding even a …
-
Tensor Cache enhances Transformer long-context memory
Researchers have developed a novel memory system called Tensor Cache for Transformers, designed to enhance their ability to handle long contexts. This system combines a sliding-window cache with a second-level fast-weig…
-
Stepfun AI releases 198B parameter multimodal MoE model
Stepfun AI has released Step 3.7 Flash, a 198-billion parameter sparse Mixture-of-Experts (MoE) vision-language model. This model is optimized for agentic workflows, coding, and multimodal tasks, activating approximatel…
-
Google DeepMind releases multimodal Gemma 4 12B models
Google DeepMind has released several variants of its Gemma 4 models, including the 12B parameter versions. These models are multimodal, capable of processing text, image, audio, and video inputs, with a focus on efficie…
-
Fixing local LLM OOM errors by optimizing KV cache and quantization
Running large open-source language models locally can lead to out-of-memory errors, even if the model's weights seem to fit within the available VRAM. This is primarily due to the significant memory required for the KV …
-
New CODA paper reframes Transformers as math problems
A new research paper introduces CODA, a novel approach to Transformers that reframes them as mathematical problems. This method aims to potentially revolutionize the architecture of neural networks. The paper is availab…
-
Transformers struggle with state-based decisions in search, new paper finds
Researchers have identified a critical limitation in how transformer models process serialized trajectory data during backtracking search. These models can struggle with 'scattered retrieval,' where state features are d…
-
LLM Pretraining Creates Generalizable Manifold for Time Series Forecasting
A new research paper explores how large language models (LLMs) pretrained on text can be effectively used for time-series forecasting. The study demonstrates that language pretraining equips transformers with a reusable…
-
LLMs and new frameworks boost GPU kernel optimization
Researchers are exploring novel ways to optimize GPU kernel performance for large language models. One approach uses language models as surrogates to predict kernel performance, significantly increasing the number of ca…
-
Paper reveals graph tokenization trade-offs for Transformer expressivity
A new paper explores the critical role of graph tokenization in applying Transformers to graph learning tasks. Researchers demonstrate that the method used to convert graph structures into tokens significantly impacts a…
-
OpenBMB releases MiniCPM5-1B for on-device AI tasks
OpenBMB has released MiniCPM5-1B, a 1-billion parameter Transformer model designed for on-device and resource-constrained environments. This model claims state-of-the-art performance within its size class, particularly …
-
Hugging Face releases Qwen/Qwen-Image-Bench multimodal model
Hugging Face has released Qwen/Qwen-Image-Bench, a new multimodal model capable of processing both text and images. The model is accessible through various libraries and tools, including Transformers, vLLM, and SGLang. …
-
Deformba method enhances State Space Models for vision tasks
Researchers have introduced Deformba, a novel context-adaptive method designed to enhance the application of State Space Models (SSMs) to vision tasks. Deformba addresses limitations in existing vision SSMs by dynamical…
-
Cohere releases open-source Command A+ AI model for enterprise agents
Cohere has released Command A+, an open-source, multimodal AI model designed for enterprise use and agentic tasks. This new model integrates reasoning, vision, and multilingual capabilities, supporting 48 languages and …
-
New HORST optimizer enhances sparse transformer training
Researchers have developed HORST, a novel optimizer designed to improve the training of sparse transformers. Standard optimizers struggle to balance the need for sparsity with training stability. HORST addresses this by…
-
New theory explains transformer generalization via Fourier Spectra
Researchers have developed a new theoretical framework to understand how transformers generalize, focusing on the Fourier Spectra of their target functions. This approach utilizes PAC-Bayes theory to derive generalizati…