Researchers have developed a new pre-training method called Token-Superposition Training (TST) that aims to make large language model training more efficient. TST involves a two-phase process: an initial superposition phase where tokens are combined and trained with a multi-hot cross-entropy objective, followed by a recovery phase of standard training. Evaluations on models up to 10 billion parameters show TST can reduce pre-training time by up to 2.5x under equal-loss conditions. AI
影响 This method could significantly reduce the computational cost and time required for training large language models, potentially accelerating research and development.
排序理由 The cluster contains an academic paper detailing a new method for pre-training large language models.
AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →