English(EN) I built a Mamba1 variant I call SM1 with d_state=1 that runs on Blackwell in pure PyTorch [P]

开发者创建SM1，一种内存高效的PyTorch Mamba变体

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-23 05:30

一位开发者创建了SM1，这是Mamba1架构的一个变体，针对PyTorch进行了优化，并能在NVIDIA Blackwell硬件上运行。SM1用两个原生的PyTorch操作替换了选择性扫描，实现了d_state=1递归的精确闭式解。这种优化显著降低了内存使用量，一个拥有1.3亿参数的模型仅需56KB的推理状态，无需KV缓存。 AI

影响这种优化的Mamba变体可能导致某些序列建模任务的训练和推理效率更高。

排序理由开发者基于现有架构创建了一个新的模型变体，详细介绍了其技术实现和优化。[lever_c_demoted from research: ic=1 ai=1.0]

在 r/MachineLearning 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/MachineLearning TIER_1 English(EN) · /u/TechnoVoyager · 2026-05-23 05:30

I built a Mamba1 variant I call SM1 with d_state=1 that runs on Blackwell in pure PyTorch [P]

<div class="md">On windows mamba-ssm is not easily available and doesn't compile on sm_120. SM1 (Scalar Mamba1) replaces the entire selective scan with two native PyTorch ops: <code>L = torch.cumprod(dA, dim=1)</code> <code>h = L * (h0.unsqueeze(1)…

报道来源 [1]

I built a Mamba1 variant I call SM1 with d_state=1 that runs on Blackwell in pure PyTorch [P]

相关实体

相关话题