Interdomain Attention: Beyond Token-Level Key-Value Memory
Researchers have introduced Interdomain Attention, a novel mechanism that merges the strengths of Transformers and deep state space models (SSMs). This new approach integrates an SSM into an attention module using kernel methods, approximating attention kernels with feature maps and projecting key features onto a shared set of basis functions managed by an SSM recurrence. In language modeling experiments on FineWeb-Edu, Interdomain Attention demonstrated improved performance over SSMs and matched softmax baselines, particularly at larger scales and longer context lengths. AI
IMPACT Introduces a novel architecture that could improve efficiency and performance in large language models.