English(EN) Scaling Adaptive Depth with Norm-Agnostic Residual Networks

新型无范数残差网络架构支持更深层模型

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-16 04:00

研究人员开发了一种名为NAG（Norm-Agnostic）的新型神经网络架构，解决了残差网络中残差流范数随深度增加而增长，导致后续层影响减弱的局限性。NAG将幅度和方向信息分开，使有意义的层贡献得以持续，并能够以可忽略的参数增加有效地训练更深层的模型。该架构还引入了一种可解释的深度混合（MoD）机制，可以自适应地跳过层，作为训练后精度-计算权衡或预训练时扩展策略。实验表明，NAG在更大深度下优于基线Transformer，并且MoD可以通过将节省的计算量重新投入到更多token上来实现可比的性能。 AI

影响通过解决残差网络扩展中的基本局限性，实现了更深层、更高效模型的训练。

排序理由该集群包含一篇详细介绍新模型架构的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

arXiv

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Tom\'as Figliolia, Beren Millidge · 2026-06-16 04:00

Scaling Adaptive Depth with Norm-Agnostic Residual Networks

arXiv:2606.16112v1 Announce Type: cross Abstract: Residual architectures are ubiquitous in deep learning, but they suffer from a subtle structural limitation: the norm of the residual stream can grow rapidly with depth. As a result, updates from later layers become small relative…

报道来源 [1]

Scaling Adaptive Depth with Norm-Agnostic Residual Networks

相关实体

相关话题