PulseAugur
实时 12:25:51
English(EN) Nous Research Releases Token Superposition Training to Speed Up LLM Pre-Training by Up to 2.5x Across 270M to 10B Parameter Models

Nous Research 通过 Token Superposition 将 LLM 预训练时间缩短 2.5 倍

Nous Research 开发了 Token Superposition Training (TST) 方法,旨在显著加速大型语言模型 (LLM) 的预训练。该技术可以将 2.7 亿至 100 亿参数模型的预训练时间缩短高达 2.5 倍,同时不改变模型的架构或推理方式。TST 通过在两个阶段修改训练循环来实现:初始的“叠加”阶段,其中 token 嵌入被平均并在更大的批次中处理;随后是恢复到标准训练的“恢复”阶段。实验表明,与传统方法相比,TST 在计算时间大大减少的情况下实现了更低的最终训练损失。 AI

影响 加速 LLM 预训练,可能降低开发新大型语言模型的计算成本和时间。

排序理由 研究论文,详细介绍了一种加速 LLM 预训练的新颖方法。

在 MarkTechPost 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。 我们如何撰写摘要 →

Nous Research 通过 Token Superposition 将 LLM 预训练时间缩短 2.5 倍

报道来源 [4]

  1. MarkTechPost TIER_1 English(EN) · Asif Razzaq ·

    Nous Research Releases Token Superposition Training to Speed Up LLM Pre-Training by Up to 2.5x Across 270M to 10B Parameter Models

    <p>Nous Research releases Token Superposition Training (TST), a two-phase pre-training method that cuts wall-clock training time by up to 2.5x at matched FLOPs by averaging contiguous token embeddings into bags during Phase 1 and reverting to standard next-token prediction in Pha…

  2. Mastodon — mastodon.social TIER_1 English(EN) · [email protected] ·

    Nous Research has released Token Superposition Training, a technique that speeds up LLM pre-training by up to 2.5x across models from 270M to 10B parameters. Th

    Nous Research has released Token Superposition Training, a technique that speeds up LLM pre-training by up to 2.5x across models from 270M to 10B parameters. The approach could reduce compute costs significantly for AI labs. https://www. marktechpost.com/2026/05/13/no us-research…

  3. Mastodon — mastodon.social TIER_1 English(EN) · aihaberleri ·

    📰 Token Superposition Training: Nous Research Speeds LLM Pre-Training 2.5x in 2026 Nous Research has unveiled Token Superposition Training (TST), a novel two-ph

    📰 Token Superposition Training: Nous Research Speeds LLM Pre-Training 2.5x in 2026 Nous Research has unveiled Token Superposition Training (TST), a novel two-phase method that accelerates large language model pre-training by up to 2.5 times without altering model architecture or …

  4. Mastodon — mastodon.social TIER_1 Türkçe(TR) · aihaberleri ·

    📰 LLM Training Accelerates 250% with Token Superposition (2026) Nous Research, 2.5x faster pre-training of large language models (LLMs) from 270M to 10B parameters

    📰 Token Süperpozisyonu ile LLM Eğitimi %250 Hızlanıyor (2026) Nous Research, büyük dil modellerinin (LLM) ön eğitimini 270M'den 10B parametreye kadar 2.5 kata kadar hızlandıran çığır açıcı bir yöntem olan Token Süperpozisyonu Eğitimini duyurdu. Bu teknik, mevcut süperpozisyon teo…