English(EN) Most Transformer Modifications Still Do Not Transfer at 1-3B: A 2020-2026 Update to Narang et al. (2021) with Downstream Evaluation and a Noise Floor

研究发现 Transformer 修改在 1-3B 参数规模下无法迁移

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-20 06:43

一项近期研究重新评估了 Transformer 模型修改的有效性，发现大多数修改在扩展到 10 亿至 30 亿参数时仍未产生显著改进。研究人员测试了 2021 年后引入的 20 项修改，使用了下游评估指标，并控制了数据、计算和训练方法等变量。研究结果在很大程度上呼应了 2021 年的一项研究，只有少数几项修改显示出益处，其中一项在更大规模下被证明不稳定。该研究强调了在架构比较中进行严格报告、下游评估和跨尺度稳定性测试的必要性。 AI

影响证实了大型语言模型的架构创新通常无法有效扩展，表明需要更稳健的评估方法。

排序理由学术论文，呈现关于模型架构有效性的新研究发现。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Jie Zhou · 2026-05-20 06:43

Most Transformer Modifications Still Do Not Transfer at 1-3B: A 2020-2026 Update to Narang et al. (2021) with Downstream Evaluation and a Noise Floor

Narang et al. (2021) evaluated 40+ Transformer modifications at T5-base scale and concluded that most did not transfer. Five years later, the typical working regime has moved to 1-3B parameters, downstream evaluation has replaced pretraining perplexity, and a substantially differ…

报道来源 [1]

Most Transformer Modifications Still Do Not Transfer at 1-3B: A 2020-2026 Update to Narang et al. (2021) with Downstream Evaluation and a Noise Floor

相关实体

相关话题