English(EN) @ashVaswani @NoamShazeer @YesThisIsLion @metaai @MistralAI @tri_dao The long-context demands of agentic AI accelerated attention research aimed at overcoming th

开源AI中Transformer注意力机制的演进

作者 PulseAugur 编辑部 · [8 个来源] · 2026-06-29 01:00

自诞生以来，Transformer架构的注意力机制经历了显著的演进，众多创新为更高效、更强大的大型语言模型做出了贡献。FlashAttention、多查询注意力（MQA）、分组查询注意力（GQA）和滑动窗口注意力（SWA）等创新极大地降低了内存需求并提高了推理性能。最新的进展，包括门控Delta网络（GDNs）等线性注意力变体和原生稀疏注意力（DSA）等稀疏注意力方法，正在进一步拓展边界，许多开源模型都采用了这些技术。 AI

影响注意力机制的这些进步对于提高LLM效率和实现更长的上下文窗口至关重要，直接影响模型性能和可访问性。

排序理由该集群详细介绍了Transformer模型注意力机制的进展，包括特定技术及其在开源模型中的应用。

在 X — SemiAnalysis 阅读 →

AI 生成摘要 · Google Gemini · 来自 8 个来源。我们如何撰写摘要 →

报道来源 [8]

X — SemiAnalysis TIER_1 English(EN) · SemiAnalysis_ · 2026-06-29 01:00

@ashVaswani @NoamShazeer @YesThisIsLion @metaai @MistralAI @tri_dao @SonglinYang4 大约在同一时间，vLLM推理引擎及其底层PagedAtten

@ashVaswani @NoamShazeer @YesThisIsLion @metaai @MistralAI @tri_dao @SonglinYang4 Around the same time, the vLLM inference engine and its underlying Paged Attention took the open-source community by storm. Started by @woosuk_k, the @vllm_project has become one of the most widely …
X — SemiAnalysis TIER_1 English(EN) · SemiAnalysis_ · 2026-06-29 01:00

随着 ChatGPT 的爆红，LLM 服务的研究变得非常活跃

@ashVaswani @NoamShazeer @YesThisIsLion @metaai @MistralAI @tri_dao @SonglinYang4 As ChatGPT exploded in popularity, research on LLM serving became highly active. Efficient LLM serving remained a major challenge until the invention of KV cache-managing Attention methods, such as …
X — SemiAnalysis TIER_1 English(EN) · SemiAnalysis_ · 2026-06-29 01:00

@ashVaswani @NoamShazeer @YesThisIsLion @metaai @MistralAI @tri_dao 智能体AI的长上下文需求加速了旨在克服...

@ashVaswani @NoamShazeer @YesThisIsLion @metaai @MistralAI @tri_dao The long-context demands of agentic AI accelerated attention research aimed at overcoming the context wall. Over the past year, linear attention has become mainstream, most notably with Gated Delta Networks (GDNs…
X — SemiAnalysis TIER_1 English(EN) · SemiAnalysis_ · 2026-06-29 01:00

注意力机制的创新并未停止，尽管 MHA/GQA/SWA 仍难以超越

@ashVaswani @NoamShazeer @YesThisIsLion @metaai @MistralAI @tri_dao Innovation in attention mechanisms did not stop, even though MHA/GQA/SWA remain hard to beat. In 2024, DeepSeek-V3/R1 demonstrated near-frontier capabilities, proving the effectiveness of their in-house Multi-Hea…
X — SemiAnalysis TIER_1 English(EN) · SemiAnalysis_ · 2026-06-29 01:00

@ashVaswani @NoamShazeer @YesThisIsLion @metaai @MistralAI MHA以来最伟大的飞跃之一是@tri_dao的FlashAttention。FlashAttention极大地减少了

@ashVaswani @NoamShazeer @YesThisIsLion @metaai @MistralAI One of the greatest leaps since MHA was FlashAttention by @tri_dao. FlashAttention dramatically reduced memory requirements for both the forward and backward passes of attention, unlocking major performance gains and enab…
X — SemiAnalysis TIER_1 English(EN) · SemiAnalysis_ · 2026-06-29 01:00

@ashVaswani @NoamShazeer @YesThisIsLion MHA 的早期变体包括由 Noam Shazeer 发明的多查询注意力 (MQA)、分组查询注意力 (GQA)，

@ashVaswani @NoamShazeer @YesThisIsLion The early variants of MHA include Multi-Query Attention (MQA), invented by Noam Shazeer, Grouped-Query Attention (GQA), invented by the @MetaAI LLaMA team, and Sliding Window Attention (SWA), popularized by @MistralAI. MQA, GQA, and SWA bui…
X — SemiAnalysis TIER_1 English(EN) · SemiAnalysis_ · 2026-06-29 01:00

与2017年《变形金刚》电影系列的缓慢衰落形成鲜明对比的是，NLP中的Transformer架构展现出了巨大的潜力。它引入了多头注意力机制

In contrast to the slow decline of the Transformers movie series in 2017, the Transformer architecture in NLP showed immense potential. It introduced Multi-Head Attention (MHA) and dramatically improved perplexity scores. We thank @ashVaswani, @NoamShazeer, @YesThisIsLion, and ht…
X — SemiAnalysis TIER_1 English(EN) · SemiAnalysis_ · 2026-06-29 01:00

Transformer 的注意力机制已经走了很长一段路。我们要感谢开源社区的研究人员和工程师们，感谢他们持续为

Transformer’s Attention mechanism has come a long way. We’d like to thank the researchers and the engineers in the open-source community for continuing to make high-performance AI accessible. Please celebrate with us by sharing this post, tagging more contributors, and sharing ht…

报道来源 [8]

相关实体

相关话题