Researchers have identified a phenomenon called "weight drift" in neural networks, where optimization processes inadvertently push weights towards negative values. This drift, independent of the training data, occurs with standard loss functions and common activation functions like ReLU and GELU. The study demonstrates that this drift can lead to significant activation sparsity, potentially impacting model accuracy, and can also amplify activation spikes in transformer layers. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Identifies a fundamental training dynamic that could impact model performance and efficiency across various architectures.
RANK_REASON Academic paper detailing a newly identified phenomenon in neural network training dynamics. [lever_c_demoted from research: ic=1 ai=1.0]