State Space Models
PulseAugur coverage of State Space Models — every cluster mentioning State Space Models across labs, papers, and developer communities, ranked by signal.
2 天有情绪数据
-
研究人员精确定位 Mamba 模型瓶颈,提升性能
研究人员已识别并利用 Mamba 系列状态空间模型(SSMs)中的激活子空间瓶颈来提高其性能。通过在测试期间对这些瓶颈激活应用简单的标量乘法,他们在多个 SSM 和基准测试中实现了平均 8.27% 的性能提升,且无需进行特定任务的调优。通过重新训练一个修改后的架构(称为 Stable-Mamba)进行的进一步验证,证明了在长上下文性能方面取得了显著的提升,证实了所识别的瓶颈会阻碍性能。
-
新的内存分页技术提高了混合式大语言模型推理效率
研究人员开发了一种名为非对称虚拟内存分页(AVMP)的新内存管理技术,以提高混合式语言模型的效率。这些模型结合了Transformer层和状态空间模型(SSM),导致存在当前系统处理不佳的独特内存缓存类型。AVMP将这些缓存类型分离到不同的池中,并在需要时允许它们之间的容量迁移,从而减少内存不足事件并显著提高请求吞吐量。
-
新的T-SNL算法增强了状态空间模型的推断能力
研究人员开发了一种名为截断神经似然(T-SNL)的新算法,以改进状态空间模型(SSM)中的参数推断。现有的顺序神经似然(SNL)等方法在长序列的样本效率和可扩展性方面存在困难。T-SNL解决了这些限制,提供了一种更准确、更稳定、更具摊销性的方法,在样本效率和鲁棒性方面优于以前的方法。
-
Deformba method enhances State Space Models for vision tasks
Researchers have introduced Deformba, a novel context-adaptive method designed to enhance the application of State Space Models (SSMs) to vision tasks. Deformba addresses limitations in existing vision SSMs by dynamical…
-
Looped SSMs improve time series classification with depth-recurrence
Researchers have introduced Looped SSMs, a novel approach to State Space Models for time series classification. This method enhances performance by applying depth-recurrence, where model blocks are reused across layers,…
-
Quantum memory approach enhances long-sequence token modeling
Researchers have developed QLAM, a novel hybrid quantum-classical memory mechanism designed to enhance long-sequence token modeling. QLAM represents the hidden state as a quantum state, leveraging superposition to encod…
-
循环模型因误差动力学而在状态跟踪方面失败
研究人员引入了一种关于循环神经网络架构中状态跟踪的新视角,强调误差控制动力学而非理论表达能力。他们证明了仿射循环网络(包括状态空间模型和线性注意力)由于无法在状态分离子空间上纠正误差,因此在鲁棒状态跟踪方面存在困难。这种限制导致了由累积误差决定的有限视界解决方案,并且随着可区分性比率跨越临界阈值,跟踪精度会可预测地下降。
-
New paper proves AI models face 'Impossibility Triangle' trade-off
Researchers have identified a fundamental trade-off in long-context models, proving that no single architecture can simultaneously achieve efficiency, compactness, and recall. The study formalizes this "Impossibility Tr…
-
New method aligns State Space Model inductive bias for better data efficiency
Researchers have developed a new framework to align the inductive bias of State Space Models (SSMs) for improved data efficiency. This method, called Task-Dependent Initialization (TDI), matches the model's initial bias…
-
New AI models tackle image and video restoration with advanced techniques
Researchers have developed several new methods for image and video restoration tasks. One approach, Continuous Expert Assembly (CEA), uses a dynamic parameterization framework to adapt to diverse local degradation patte…
-
PKS4 scanners offer efficient video understanding with 10x lower training compute
Researchers have introduced PKS$^4$, a novel approach to efficient video understanding that addresses the computational challenges of long video sequences. This method integrates a plug-and-play module with linear-compl…
-
StateX framework boosts RNN recall by expanding model states post-training
Researchers have developed StateX, a post-training framework designed to improve the recall capabilities of recurrent neural networks (RNNs). This method efficiently expands the states of pre-trained RNNs, such as linea…
-
Apple researchers unveil parallel RNN training and enhanced SSMs at ICLR 2026
Apple researchers are presenting new work at ICLR 2026, focusing on advancements in recurrent neural networks (RNNs) and state space models (SSMs). Their paper "ParaRNN" introduces a parallelized training framework that…