Gated Delta Networks 扩展规则改善 LLM 训练稳定性

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-02 00:00

研究人员为门控 Delta 网络（一种神经网络架构）开发了新的扩展规则。这些规则通过一种称为坐标-尺寸估计传播的方法得出，允许在不同模型宽度之间稳定地转移学习率。在语言模型预训练上的实验表明，与标准的参数化方法不同，这些配置通过 AdamW 和 SGD 等优化器提高了学习稳定性。 AI

影响通过在不同模型尺寸之间提供更好的超参数调整，实现更大语言模型更稳定、更高效的训练。

排序理由该集群包含一篇详细介绍神经网络架构和训练新方法的学术论文。

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Yifeng Liu, Quanquan Gu · 2026-06-04 04:00

Unlocking Feature Learning in Gated Delta Networks at Scale

arXiv:2606.04048v1 Announce Type: cross Abstract: Training and scaling Large Language Models demand enormous computational resources, motivating both efficient sub-quadratic architectures and principled hyperparameter tuning methods. While the Maximal Update Parametrization ($\mu…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-02 00:00

Unlocking Feature Learning in Gated Delta Networks at Scale

Scaling rules for Gated Delta Networks are derived through coordinate-size estimation propagation, enabling stable learning-rate transfer across model widths with both AdamW and SGD optimizers.