PulseAugur
实时 22:51:32

Nous Research's Lighthouse Attention speeds up LLM pretraining

Researchers at Nous Research have developed Lighthouse Attention, a novel hierarchical attention mechanism designed to accelerate the pretraining of large language models with long contexts. This method achieves a 1.4x to 1.7x speedup compared to standard FlashAttention by pooling queries, keys, and values symmetrically across a multi-level pyramid. Lighthouse Attention places the selection logic outside the attention kernel, allowing it to leverage optimized dense-attention kernels for improved efficiency during training. AI

影响 Accelerates LLM pretraining for long contexts, potentially enabling more efficient development of advanced models.

排序理由 The cluster describes a new research paper proposing a novel method for improving LLM training efficiency.

在 MarkTechPost 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

Nous Research's Lighthouse Attention speeds up LLM pretraining

报道来源 [3]

  1. MarkTechPost TIER_1 English(EN) · Asif Razzaq ·

    Nous Research 提出 Lighthouse Attention:一种仅训练的基于选择的层级注意力机制,可实现 1.4–1.7 倍长上下文预训练加速

    <p>Nous Research has published Lighthouse Attention, a selection-based hierarchical attention mechanism that wraps around standard scaled dot-product attention during pretraining and is removed afterward. Unlike prior methods such as NSA and HISA that pool only keys and values, L…

  2. Mastodon — sigmoid.social TIER_1 English(EN) · [email protected] ·

    Nous Research 的 Lighthouse Attention 是一种仅用于训练的、基于选择的分层注意力方法,在长上下文预训练速度上实现了 1.4-1.7 倍的加速

    Lighthouse Attention from Nous Research is a training-only selection-based hierarchical attention method that achieves 1.4-1.7x pretraining speedup on long context tasks by pooling Q, K, V across a multi-resolution pyramid and running FlashAttention on a dense sub-sequence. https…

  3. Mastodon — sigmoid.social TIER_1 English(EN) · [email protected] ·

    Nous Research 推出 Lighthouse Attention,一种用于长上下文 LLM 预训练的基于选择的分层机制,可汇集查询、键和值

    Nous Research has introduced Lighthouse Attention, a selection-based hierarchical mechanism for long-context LLM pretraining that pools queries, keys and values across a multi-resolution pyramid. The approach achieves 1.4-1.7x wall-clock speedup against standard FlashAttention. h…