Mamba-2
PulseAugur coverage of Mamba-2 — every cluster mentioning Mamba-2 across labs, papers, and developer communities, ranked by signal.
7 day(s) with sentiment data
-
NVIDIA Nemotron 3 Nano: Open Model for Efficient AI Agents
NVIDIA has released Nemotron 3 Nano, a 30-billion parameter open model designed for efficient reasoning and long-context applications. This model utilizes a hybrid Mixture-of-Experts architecture, activating only a frac…
-
NVIDIA unveils efficient Nemotron 3 LLM family with hybrid architecture
NVIDIA has released two new large language models, Nemotron 3 Nano and Nemotron 3 Ultra, focusing on efficiency and advanced capabilities. Nemotron 3 Nano is a 30B-class model designed for private inference and agentic …
-
Ternary Mamba achieves 3.61x compression via QAT with knowledge distillation
Researchers have developed a new method for compressing State Space Models (SSMs) like Mamba-2, significantly reducing their memory footprint for edge deployment. By employing grouped quantization-aware training (QAT) w…
-
New N-VSSM Model Outperforms Claude Opus 4.5 in Long-Form Narrative Consistency
Researchers have developed NarrativeWorldBench, a new benchmark designed to evaluate large language models (LLMs) on their ability to maintain narrative consistency in long-form audio dramas. Current frontier LLMs strug…
-
Compiler-first duality enables portable O(1) Mamba-2 inference
Researchers have developed a new method for optimizing Mamba-2 inference, focusing on compiler-first state space duality. This approach enables portable autoregressive caching with $O(1)$ complexity, eliminating the nee…
-
xLSTM outperforms Mamba-2 and DeltaNet in sequence modeling tasks
A new research paper compares three subquadratic architectures—xLSTM, Mamba-2, and Gated DeltaNet—for sequence modeling tasks. The study found that xLSTM outperformed the others in code-model pre-training, distillation,…
-
DF-SSM compresses Mamba-2 to 1-bit, boosting speed and reducing size
Researchers have developed Density Field State Space Models (DF-SSM), a novel framework for compressing large SSMs into a 1-bit scaffold with minimal performance loss. Applied to Mamba-2 1.3B, this method resulted in a …
-
Dynamic convolutions boost Transformer performance in LLMs
Researchers have introduced dynamic short convolutions as a new primitive to enhance Transformer architectures used in large language models. These dynamic convolutions utilize input-dependent filters, increasing expres…
-
Mamba-2 interpretation probes miss half of state sink
Researchers have identified a significant limitation in how Mamba-2's internal workings are understood. They found that standard probing techniques, which aim to link representational signatures to computational executi…
-
New framework unifies sequence models using Bayesian memory
Researchers have introduced a "design-model" framework for creating efficient recurrent sequence maps based on memory assumptions. This framework uses Bayesian filtering to write evidence into memory and a query-depende…
-
New Oryx Model Flexibly Switches Between Attention and Recurrent Mixers
Researchers have introduced Oryx, a novel hybrid model designed to flexibly switch between different sequence mixers, such as quadratic attention and linear recurrences, throughout a given sequence. This approach allows…
-
PapersWithCode adds multi-metric leaderboards and external paper support
Hugging Face has launched new features for PapersWithCode, a platform tracking AI state-of-the-art. The updates include support for multiple metrics on leaderboards, such as for Automatic Speech Recognition and Object D…
-
WriteSAE enables direct manipulation of recurrent language model states
Researchers have developed WriteSAE, a novel sparse autoencoder designed to manipulate the matrix updates within recurrent language model states. This method learns rank-1 matrix atoms that directly replace the model's …
-
NVIDIA unveils Gated DeltaNet-2 for improved linear attention
NVIDIA has introduced Gated DeltaNet-2, a new linear attention layer designed to improve memory editing in recurrent neural networks. This model separates the processes of erasing old information and writing new informa…
-
MambaGaze framework uses Mamba-2 for cognitive load assessment
Researchers have developed MambaGaze, a new framework designed to accurately assess cognitive load using eye-gaze tracking data. This system utilizes bidirectional Mamba-2 to efficiently model long-range temporal depend…
-
NVIDIA releases Nemotron-3 Ultra 550B LLM for advanced reasoning
NVIDIA has released its Nemotron-3 Ultra 550B model, a large language model designed for advanced reasoning and agentic workflows. This model features a hybrid LatentMoE architecture with Mamba-2 and attention layers, s…
-
REALM framework enables real-time LFP decoding for BCIs
Researchers have developed REALM, a new framework for real-time decoding of local field potentials (LFPs) in brain-computer interfaces. This method uses a retrospective distillation process to transfer knowledge from a …
-
Component-aware self-speculative decoding boosts hybrid language model inference
Researchers have developed a new method called component-aware self-speculative decoding, which enhances the efficiency of hybrid language models. This technique leverages the internal architectural differences within t…
-
Researchers explore optimal LoRA placement in hybrid language models
A new paper explores the optimal placement of LoRA adapters in hybrid language models, which combine attention and recurrent components. The research demonstrates that adapting the attention pathway is more effective th…
-
Together AI releases Mamba-3, prioritizing inference speed over training
Together AI has released Mamba-3, a new state space model (SSM) prioritizing inference efficiency over training speed. This model features a more expressive recurrence formula, complex-valued state tracking, and a multi…