Researchers have developed a novel decentralized framework called SPES for pretraining large language models, specifically Mixture-of-Experts (MoE) architectures. This method significantly reduces memory requirements by training only a subset of experts on each node and synchronizing knowledge efficiently across distributed GPUs, even over internet connections. SPES has demonstrated its capability by successfully training models up to 9 billion parameters, achieving performance comparable to centrally trained models within similar computational budgets. AI
影响 Introduces a memory-efficient decentralized training paradigm that could lower the hardware barrier for developing large language models.
排序理由 Academic paper detailing a new method for distributed LLM pretraining. [lever_c_demoted from research: ic=1 ai=1.0]
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →