Weierstrass Positional Encoding for Vision Transformers
Researchers have introduced Weierstrass Positional Encoding (WePE), a novel method for enhancing Vision Transformers (ViTs) by better preserving the inherent 2D spatial structure of images. Unlike existing methods that can weaken spatial relationships after patch flattening, WePE uses the Weierstrass elliptic function to encode 2D coordinates in the complex domain, leveraging its lattice structure to match image patch grids. This approach aims to more faithfully model spatial distances and allows for direct derivation of relative positional information, offering consistent performance gains with no significant computational overhead. AI
IMPACT Introduces a novel encoding method that could improve the spatial reasoning capabilities of Vision Transformers in computer vision tasks.