English(EN) BLASST: Dynamic BLocked Attention Sparsity via Softmax Thresholding

BLASST 论文介绍动态稀疏注意力以加速 LLM 推理

作者 PulseAugur 编辑部 · [1 个来源] · 2026-04-29 04:00

研究人员开发了 BLASST，一种新颖的稀疏注意力机制，旨在加速具有长上下文的大型语言模型的推理。这种即插即用的解决方案通过简单的 Softmax 阈值动态跳过注意力块，无需训练或预计算。BLASST 在保持基准准确性的同时，为各种注意力变体的预填充和解码操作提供了显著的加速。 AI

影响加速长上下文的 LLM 推理，可能降低运营成本并改善用户体验。

排序理由这是一篇介绍改进 LLM 推理的新技术方法的学术论文。

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Jiayi Yuan, Cameron Shinn, Kai Xu, Jingze Cui, George Klimiashvili, Guangxuan Xiao, Perkz Zheng, Bo Li, Yuxin Zhou, Zhouhai Ye, Weijie You, Tian Zheng, Dominic Brown, Pengbo Wang, Markus Hoehnerbach, Richard Cai, Julien Demouth, John D. Owens, Xia Hu, Son · 2026-04-29 04:00

BLASST: Dynamic BLocked Attention Sparsity via Softmax Thresholding

arXiv:2512.12087v3 Announce Type: replace Abstract: The growing demand for long-context inference capabilities in Large Language Models (LLMs) has intensified the computational and memory bottlenecks inherent to the self-attention mechanism. To address this challenge, we introduc…