Dynamic Short Convolutions Improve Transformers
Researchers have introduced dynamic short convolutions as a new primitive to enhance Transformer architectures used in large language models. These dynamic convolutions utilize input-dependent filters, increasing expressivity while maintaining the locality bias of traditional convolutions. Experiments show consistent performance improvements over standard Transformers and static convolutional variants across various parameter scales, suggesting a significant compute advantage and potential for advancing Transformer-based language models. AI
IMPACT Introduces a novel technique that offers compute advantages and performance gains for Transformer-based language models.