English(EN) Neural Scaling Universality: If Exponents Are Fixed, Time to Understand Coefficients

论文认为神经缩放定律由固定指数决定

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-23 17:46

一篇新的立场论文提出，神经缩放定律（描述预训练损失如何随着训练时间、模型大小和计算量而降低）是由固定指数决定的。这些指数归因于通用机制，如Softmax的非线性、表示叠加以及Transformer层中的集成平均。该论文认为，虽然指数是普适的，但系数对数据和架构敏感，理解这些系数对于近期性能提升和识别改进普适性类别至关重要。 AI

影响为理解和优化未来大型语言模型开发提供了理论框架。

排序理由该集群包含一篇讨论神经缩放定律理论方面的学术论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CL TIER_1 English(EN) · Yizhou Liu, Jeff Gore · 2026-06-25 04:00

Neural Scaling Universality: If Exponents Are Fixed, Time to Understand Coefficients

arXiv:2606.25008v1 Announce Type: cross Abstract: Neural scaling laws describe how pre-training loss decays as power laws with training time, model size, and compute. This position paper argues that the exponents of these power laws are fixed by generic mechanisms: a one-third ti…
arXiv cs.CL TIER_1 English(EN) · Jeff Gore · 2026-06-23 17:46

Neural Scaling Universality: If Exponents Are Fixed, Time to Understand Coefficients

Neural scaling laws describe how pre-training loss decays as power laws with training time, model size, and compute. This position paper argues that the exponents of these power laws are fixed by generic mechanisms: a one-third time scaling due to the strong nonlinearity of Softm…

报道来源 [2]

Neural Scaling Universality: If Exponents Are Fixed, Time to Understand Coefficients

Neural Scaling Universality: If Exponents Are Fixed, Time to Understand Coefficients

相关实体

相关话题