𝐃𝐞𝐥𝐭𝐚 𝐀𝐭𝐭𝐞𝐧𝐭𝐢𝐨𝐧 𝐑𝐞𝐬𝐢𝐝𝐮𝐚𝐥𝐬 [R]
Researchers have introduced Delta Attention Residuals, a novel upgrade to residual connections in neural networks that improves cross-layer routing. This method routes over the deltas of hidden states, rather than the cumulative states themselves, which helps prevent routing collapse in deep layers. The technique has demonstrated consistent gains in perplexity across various model sizes and can be applied via drop-in fine-tuning of pretrained models with minimal parameter overhead. AI
IMPACT This architectural improvement could lead to more efficient and performant large language models.