English(EN) Nous Research Proposes Lighthouse Attention: A Training-Only Selection-Based Hierarchical Attention That Delivers 1.4–1.7× Pretraining Speedup at Long Context

Nous Research 的 Lighthouse Attention 加速了 LLM 预训练

作者 PulseAugur 编辑部 · [3 个来源] · 2026-05-16 22:23

Nous Research 的研究人员开发了 Lighthouse Attention，这是一种新颖的层级注意力机制，旨在加速长上下文大语言模型的预训练。该方法通过在多层金字塔中对称地汇集查询（queries）、键（keys）和值（values），与标准的 FlashAttention 相比，实现了 1.4 倍至 1.7 倍的速度提升。Lighthouse Attention 将选择逻辑置于注意力核（attention kernel）之外，使其能够在训练期间利用优化的密集注意力核来提高效率。 AI

影响加速长上下文 LLM 的预训练，可能从而更有效地开发先进模型。

排序理由该集群描述了一篇提出新方法以提高 LLM 训练效率的新研究论文。

在 MarkTechPost 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。我们如何撰写摘要 →

Nous Research 的 Lighthouse Attention 加速了 LLM 预训练

报道来源 [3]

MarkTechPost TIER_1 English(EN) · Asif Razzaq · 2026-05-16 22:23

Nous Research 提出 Lighthouse Attention：一种仅训练的基于选择的层级注意力机制，可实现 1.4–1.7 倍长上下文预训练加速

<p>Nous Research has published Lighthouse Attention, a selection-based hierarchical attention mechanism that wraps around standard scaled dot-product attention during pretraining and is removed afterward. Unlike prior methods such as NSA and HISA that pool only keys and values, L…
Mastodon — sigmoid.social TIER_1 English(EN) · [email protected] · 2026-05-16 23:51

Nous Research 的 Lighthouse Attention 是一种仅用于训练的、基于选择的分层注意力方法，在长上下文预训练速度上实现了 1.4-1.7 倍的加速

Lighthouse Attention from Nous Research is a training-only selection-based hierarchical attention method that achieves 1.4-1.7x pretraining speedup on long context tasks by pooling Q, K, V across a multi-resolution pyramid and running FlashAttention on a dense sub-sequence. https…

链接 marktechpost.com/…/nous-research-proposes…
Mastodon — sigmoid.social TIER_1 English(EN) · [email protected] · 2026-05-16 22:52

Nous Research 推出 Lighthouse Attention，一种用于长上下文 LLM 预训练的基于选择的分层机制，可汇集查询、键和值

Nous Research has introduced Lighthouse Attention, a selection-based hierarchical mechanism for long-context LLM pretraining that pools queries, keys and values across a multi-resolution pyramid. The approach achieves 1.4-1.7x wall-clock speedup against standard FlashAttention. h…

链接 marktechpost.com/…/nous-research-proposes…

报道来源 [3]

Nous Research 提出 Lighthouse Attention：一种仅训练的基于选择的层级注意力机制，可实现 1.4–1.7 倍长上下文预训练加速

Nous Research 的 Lighthouse Attention 是一种仅用于训练的、基于选择的分层注意力方法，在长上下文预训练速度上实现了 1.4-1.7 倍的加速

Nous Research 推出 Lighthouse Attention，一种用于长上下文 LLM 预训练的基于选择的分层机制，可汇集查询、键和值

相关实体

相关话题