研究人员探索通过模块化组合和分层扩展来增长 Transformer 模型

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-05 04:00

研究人员探索了一种通过向冻结的基础模型增量添加新层来训练 Transformer 模型的方法，同时保持可训练参数的恒定预算。这种被称为“Growing Transformers”的方法表明，即使只更新模型参数的一小部分，新的模块也可以被有效训练。即使在高度受限的 token 接口下，一个 16 层模型也取得了显著的 MMLU 分数，这表明在参数预算限制下持续学习的可行性，尽管与整体训练相比，最终的困惑度有所权衡。 AI

影响这项研究为更具参数效率的模型扩展和持续学习提供了一条潜在的途径。

排序理由该集群包含一篇 arXiv 论文，详细介绍了 Transformer 模型的一种新颖训练方法。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · A. Bochkov · 2026-05-05 04:00

Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate

arXiv:2507.07129v3 Announce Type: replace-cross Abstract: We study a constrained training regime for decoder-only Transformers in which the token interface is fixed, previously trained dense blocks are not reopened, and the active trainable parameter set is kept approximately con…

报道来源 [1]

Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate

相关实体

相关话题