PulseAugur
实时 09:44:36
English(EN) Nous Research Proposes Lighthouse Attention: A Training-Only Selection-Based Hierarchical Attention That Delivers 1.4–1.7× Pretraining Speedup at Long Context

Nous Research 的 Lighthouse Attention 加速了 LLM 预训练

Nous Research 的研究人员开发了 Lighthouse Attention,这是一种新颖的层级注意力机制,旨在加速长上下文大语言模型的预训练。该方法通过在多层金字塔中对称地汇集查询(queries)、键(keys)和值(values),与标准的 FlashAttention 相比,实现了 1.4 倍至 1.7 倍的速度提升。Lighthouse Attention 将选择逻辑置于注意力核(attention kernel)之外,使其能够在训练期间利用优化的密集注意力核来提高效率。 AI

影响 加速长上下文 LLM 的预训练,可能从而更有效地开发先进模型。

排序理由 该集群描述了一篇提出新方法以提高 LLM 训练效率的新研究论文。

在 MarkTechPost 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

Nous Research 的 Lighthouse Attention 加速了 LLM 预训练

报道来源 [3]

  1. MarkTechPost TIER_1 English(EN) · Asif Razzaq ·

    Nous Research 提出 Lighthouse Attention:一种仅训练的基于选择的层级注意力机制,可实现 1.4–1.7 倍长上下文预训练加速

    <p>Nous Research has published Lighthouse Attention, a selection-based hierarchical attention mechanism that wraps around standard scaled dot-product attention during pretraining and is removed afterward. Unlike prior methods such as NSA and HISA that pool only keys and values, L…

  2. Mastodon — sigmoid.social TIER_1 English(EN) · [email protected] ·

    Nous Research 的 Lighthouse Attention 是一种仅用于训练的、基于选择的分层注意力方法,在长上下文预训练速度上实现了 1.4-1.7 倍的加速

    Lighthouse Attention from Nous Research is a training-only selection-based hierarchical attention method that achieves 1.4-1.7x pretraining speedup on long context tasks by pooling Q, K, V across a multi-resolution pyramid and running FlashAttention on a dense sub-sequence. https…

  3. Mastodon — sigmoid.social TIER_1 English(EN) · [email protected] ·

    Nous Research 推出 Lighthouse Attention,一种用于长上下文 LLM 预训练的基于选择的分层机制,可汇集查询、键和值

    Nous Research has introduced Lighthouse Attention, a selection-based hierarchical mechanism for long-context LLM pretraining that pools queries, keys and values across a multi-resolution pyramid. The approach achieves 1.4-1.7x wall-clock speedup against standard FlashAttention. h…