Nous Research's Lighthouse Attention speeds up LLM pretraining

By PulseAugur Editorial · [3 sources] · 2026-05-16 22:23

Researchers at Nous Research have developed Lighthouse Attention, a novel hierarchical attention mechanism designed to accelerate the pretraining of large language models with long contexts. This method achieves a 1.4x to 1.7x speedup compared to standard FlashAttention by pooling queries, keys, and values symmetrically across a multi-level pyramid. Lighthouse Attention places the selection logic outside the attention kernel, allowing it to leverage optimized dense-attention kernels for improved efficiency during training. AI

IMPACT Accelerates LLM pretraining for long contexts, potentially enabling more efficient development of advanced models.

RANK_REASON The cluster describes a new research paper proposing a novel method for improving LLM training efficiency.

Read on MarkTechPost →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

Nous Research's Lighthouse Attention speeds up LLM pretraining

COVERAGE [3]

MarkTechPost TIER_1 English(EN) · Asif Razzaq · 2026-05-16 22:23

Nous Research Proposes Lighthouse Attention: A Training-Only Selection-Based Hierarchical Attention That Delivers 1.4–1.7× Pretraining Speedup at Long Context

<p>Nous Research has published Lighthouse Attention, a selection-based hierarchical attention mechanism that wraps around standard scaled dot-product attention during pretraining and is removed afterward. Unlike prior methods such as NSA and HISA that pool only keys and values, L…
Mastodon — sigmoid.social TIER_1 English(EN) · [email protected] · 2026-05-16 23:51

Lighthouse Attention from Nous Research is a training-only selection-based hierarchical attention method that achieves 1.4-1.7x pretraining speedup on long cont

Lighthouse Attention from Nous Research is a training-only selection-based hierarchical attention method that achieves 1.4-1.7x pretraining speedup on long context tasks by pooling Q, K, V across a multi-resolution pyramid and running FlashAttention on a dense sub-sequence. https…

LINKS marktechpost.com/…/nous-research-proposes…
Mastodon — sigmoid.social TIER_1 English(EN) · [email protected] · 2026-05-16 22:52

Nous Research has introduced Lighthouse Attention, a selection-based hierarchical mechanism for long-context LLM pretraining that pools queries, keys and values

Nous Research has introduced Lighthouse Attention, a selection-based hierarchical mechanism for long-context LLM pretraining that pools queries, keys and values across a multi-resolution pyramid. The approach achieves 1.4-1.7x wall-clock speedup against standard FlashAttention. h…

LINKS marktechpost.com/…/nous-research-proposes…

COVERAGE [3]

Nous Research Proposes Lighthouse Attention: A Training-Only Selection-Based Hierarchical Attention That Delivers 1.4–1.7× Pretraining Speedup at Long Context

Lighthouse Attention from Nous Research is a training-only selection-based hierarchical attention method that achieves 1.4-1.7x pretraining speedup on long cont

Nous Research has introduced Lighthouse Attention, a selection-based hierarchical mechanism for long-context LLM pretraining that pools queries, keys and values

RELATED ENTITIES

RELATED TOPICS