PulseAugur / Brief
EN
LIVE 03:36:47

Brief

last 24h
[1/1] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. Bug or Feature$^2$: Weight Drift, Activation Sparsity and Spikes

    Researchers have identified a phenomenon called "weight drift" in neural networks, where optimization processes inadvertently push weights towards negative values. This drift, independent of the training data, occurs with standard loss functions and common activation functions like ReLU and GELU. The study demonstrates that this drift can lead to significant activation sparsity, potentially impacting model accuracy, and can also amplify activation spikes in transformer layers. AI

    IMPACT Identifies a fundamental training dynamic that could impact model performance and efficiency across various architectures.