Mamba
PulseAugur coverage of Mamba — every cluster mentioning Mamba across labs, papers, and developer communities, ranked by signal.
- instance of State Space Model 90%
- instance of State Space Models 90%
- competes with State space models: Univariate representation of a multivariate model, partial interpolation and periodic convergence 70%
- used by State Space Model 70%
- competes with CNN 70%
- instance of State space models: Univariate representation of a multivariate model, partial interpolation and periodic convergence 60%
- instance of long short-term memory 60%
- affiliated with State Space Models 50%
8 天有情绪数据
-
SO-Mamba利用状态空间模型推进MRI重建
研究人员开发了SO-Mamba,这是一种新颖的、用于加速MRI重建的状态空间模型。该模型通过在其处理阶段区分持久性重建证据和依赖于更新的信息来改进现有方法。SO-Mamba利用状态所有权路由器来管理这些证据,从而提高MRI扫描的准确性和解剖学连贯性。在多个公共基准上的实验表明,SO-Mamba在计算效率保持不变的情况下,其性能优于CNN、Transformer和标准的Mamba类方法。
-
稀疏Mamba解码器提升量子纠错速度与准确性
研究人员开发了一种名为稀疏Mamba解码器(SMD)的新型神经网络解码器,专为量子纠错而设计。该解码器能够高效地仅处理活跃的错误事件,而非整个综合征阵列,从而显著降低了计算复杂度。与现有解码器相比,SMD在包括Google Sycamore的实验数据在内的各种基准测试中,均展现出更高的准确性和显著更快的处理速度。
-
研究人员精确定位 Mamba 模型瓶颈,提升性能
研究人员已识别并利用 Mamba 系列状态空间模型(SSMs)中的激活子空间瓶颈来提高其性能。通过在测试期间对这些瓶颈激活应用简单的标量乘法,他们在多个 SSM 和基准测试中实现了平均 8.27% 的性能提升,且无需进行特定任务的调优。通过重新训练一个修改后的架构(称为 Stable-Mamba)进行的进一步验证,证明了在长上下文性能方面取得了显著的提升,证实了所识别的瓶颈会阻碍性能。
-
新的内存分页技术提高了混合式大语言模型推理效率
研究人员开发了一种名为非对称虚拟内存分页(AVMP)的新内存管理技术,以提高混合式语言模型的效率。这些模型结合了Transformer层和状态空间模型(SSM),导致存在当前系统处理不佳的独特内存缓存类型。AVMP将这些缓存类型分离到不同的池中,并在需要时允许它们之间的容量迁移,从而减少内存不足事件并显著提高请求吞吐量。
-
STM3模型在长期时空时间序列预测方面取得进展
研究人员推出STM3,一个新颖的专家混合(Mixture-of-Experts)框架,旨在增强长期时空时间序列预测能力。该方法集成了多尺度Mamba架构与解耦专家混合(Disentangled Mixture-of-Experts, DMoE),以有效捕捉多样化的多尺度信息。STM3还采用自适应图因果网络来建模复杂的空间依赖性,并使用具有因果对比学习的稳定路由策略来实现鲁棒表示。在十个真实世界基准数据集上的实验表明,STM3取得了最先…
-
预训练数据决定LLM的缩放定律,研究发现
研究人员发现,预训练数据是大型语言模型中损失到损失缩放定律的主要决定因素。他们的实验表明,模型大小、优化超参数,甚至Transformer和状态空间模型之间的架构差异等因素对这些缩放趋势的影响有限。研究结果表明,精心策划的预训练数据集对于优化下游性能至关重要,而其他模型配置可以针对训练效率进行调整。
-
SubQ推出具有亚二次方注意力的12M上下文LLM
SubQ推出了一款新的前沿LLM,SubQ,它具有1200万个token的上下文窗口和一个新颖的亚二次方注意力机制。这种方法旨在克服传统二次方注意力的计算限制,后者在上下文长度加倍时计算量会增加四倍。SubQ的学习稀疏注意力在推理时动态选择相关的token对,与全注意力模型相比,成本显著降低。
-
权重衰减控制 Transformer 训练机制,揭示新的诊断方法
研究人员发现,在模块化算术任务上,权重衰减是控制 Transformer 训练机制的关键参数。他们引入了两种新的、低成本的在线诊断方法——平均成对注意力头余弦相似度和熵标准差——以监测注意力激活的训练动态。这些诊断方法应用于各种实验条件和模型规模,能有效区分记忆、泛化(grokking)和崩溃,并确定了记忆到发展的边界的具体过渡点。
-
CogScale基准加速AI序列处理评估
研究人员推出CogScale,一个旨在高效评估AI架构序列处理能力的新基准。该基准包含14个可扩展的合成任务,允许在进行大量训练之前快速验证新设计。使用CogScale进行的初步评估测试了包括GRU、LSTM、Mamba和Transformer变体在内的七种不同架构,涵盖了各种参数预算和难度级别。
-
HexagonalWarriorMamba framework improves ECG cardiac abnormality classification
Researchers have developed HexagonalWarriorMamba (HWMamba), a novel framework based on the Mamba architecture for classifying cardiac abnormalities from 12-lead ECGs. This model treats ECGs as 2D images and incorporates…
-
New frameworks boost precipitation nowcasting with Mamba and diffusion models
Researchers have developed two new frameworks, MambaRain and VMU-Diff, to improve precipitation nowcasting accuracy for the crucial 0-3 hour window. MambaRain integrates Mamba's efficient long-range temporal modeling wi…
-
Phasor Memory Networks tackle gradient instability in explicit memory models
Researchers have introduced Phasor Memory Networks (PMNet), a novel architecture designed to overcome the gradient instability issues that have historically plagued explicit memory models. By employing Unitary Phasor Dy…
-
STAR framework boosts few-shot action recognition with LLM-guided temporal learning
Researchers have developed a new framework called STAR (Semantic-Temporal Adaptive Representation Learning) to improve few-shot action recognition in videos. This approach addresses issues of semantic-temporal misalignm…
-
新的注意力方法应对大语言模型长上下文挑战
研究人员正在开发新的注意力机制来处理大型语言模型中日益增长的长上下文。一种方法,Runtime-Certified Bounded-Error Quantized Attention,使用分层 KV 缓存来压缩内存,同时保证回退到精确注意力,确保语言建模和检索等任务的质量。另一种方法,DashAttention,采用可微分稀疏分层注意力来适应性地选择相关 token,以与全注意力相当的准确性实现高稀疏度,并提供优于现有分层方法的性能。…
-
Mamba-based neural decoder offers scalable solution for error-correcting codes
Researchers have developed a new neural decoder called MMPD, which utilizes Mamba state-space blocks to efficiently process long error-correcting codes. This attention-free approach significantly reduces memory and comp…
-
New Mamba-based network improves EEG decoding for stroke patients
Researchers have developed CFSPMNet, a novel framework designed to improve the decoding of motor imagery electroencephalography (MI-EEG) signals for stroke patients. This new model addresses the challenge of cross-patie…
-
NVIDIA Star Elastic embeds multiple reasoning models in one checkpoint
NVIDIA researchers have introduced Star Elastic, a novel post-training method that embeds multiple reasoning models of varying parameter sizes within a single checkpoint. This approach allows for the extraction of small…
-
VIMCAN network fuses Mamba and attention for real-time 3D human pose estimation
Researchers have developed VIMCAN, a novel hybrid network for visual-inertial 3D human pose estimation. This architecture integrates Mamba's efficient sequence modeling with Cross-Attention's spatial reasoning capabilit…
-
New research links neural network OOD generalization to feature engineering
Researchers have identified that deep neural networks often fail to learn representations that generalize to out-of-distribution (OOD) data because they cannot decouple feature learning from data-generating process iden…
-
GEM model generates LiDAR world models for autonomous driving
Researchers have developed GEM, a generative LiDAR world model designed to simulate environmental dynamics for autonomous driving. The model utilizes a deformable Mamba architecture to overcome challenges with disordere…