English(EN) Mixtures of Subspaces for Bandwidth Efficient Context Parallel Training

新方法可实现 100K+ 上下文的高效去中心化 LLM 训练

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-16 04:00

研究人员开发了一种在去中心化环境中训练具有扩展上下文窗口的大型语言模型的新颖方法。这种称为子空间混合（Mixtures of Subspaces）的技术，通过利用激活输出的低秩结构，显著压缩了通信开销。该方法实现了超过 95% 的压缩率，且收敛损失可忽略不计，使得即使在慢速网络上也能训练上下文长度超过 100,000 个 token 的数十亿参数模型。这种方法在高速互连上的收敛速度与中心化模型相当，使去中心化训练更加实用。 AI

影响使得在去中心化环境中训练具有非常长上下文窗口的大型语言模型成为可能，从而可能降低基础设施成本并提高可访问性。

排序理由该集群包含一篇详细介绍新研究方法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

Sameera Ramasinghe

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.LG TIER_1 English(EN) · Sameera Ramasinghe, Ajanthan Thalaiyasingam, Hadi Mohaghegh Dolatabadi, Gil Avraham, Violetta Shevchenko, Yan Zuo, Chamin Hewa Koneputugodage, Alexander Long · 2026-06-16 04:00

Mixtures of Subspaces for Bandwidth Efficient Context Parallel Training

arXiv:2606.16384v1 Announce Type: new Abstract: Pretraining language models with extended context windows enhances their ability to leverage rich information during generation. Existing methods split input sequences into chunks, broadcast them across multiple devices, and compute…

报道来源 [1]

Mixtures of Subspaces for Bandwidth Efficient Context Parallel Training

相关实体

相关话题