English(EN) Bug or Feature$^2$: Weight Drift, Activation Sparsity and Spikes

已识别出神经网络权重漂移是训练动态问题

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-22 04:00

研究人员在神经网络中发现了一种称为“权重漂移”的现象，其中优化过程会无意中将权重推向负值。这种漂移独立于训练数据，在使用标准损失函数和 ReLU、GELU 等常见激活函数时会出现。研究表明，这种漂移会导致显著的激活稀疏性，可能影响模型准确性，并且还会放大 Transformer 层中的激活尖峰。 AI

影响识别出一种可能影响各种架构模型性能和效率的基本训练动态。

排序理由学术论文，详细介绍了神经网络训练动态中新发现的现象。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.LG TIER_1 English(EN) · Egor Shvetsov, Aleksandr Serkov, Shokorov Viacheslav, Redko Dmitry, Vladislav Goloshchapov, Evgeny Burnaev · 2026-05-22 04:00

Bug or Feature$^2$: Weight Drift, Activation Sparsity and Spikes

arXiv:2605.17659v2 Announce Type: replace Abstract: The design of modern neural architectures has converged through incremental empirical choices, yet the mechanisms governing their training dynamics remain only partially understood. We identify and analyze a negative weight drif…

报道来源 [1]

Bug or Feature$^2$: Weight Drift, Activation Sparsity and Spikes

相关实体

相关话题