PulseAugur
实时 13:19:10

Researchers propose Gaussian Kernel Attention as a projection-free alternative to standard Transformer…

Researchers have introduced Gaussian Kernel Attention (GKA), a novel mechanism designed to replace the standard dot-product attention in Transformers. GKA utilizes a Gaussian radial basis function kernel to compute token affinities directly, eliminating the need for learned linear projections. This approach can be interpreted as normalized kernel regression, connecting Transformers to classical filtering methods. Evaluations in language modeling showed a GKA model achieving competitive performance with fewer parameters and less training compute than a standard attention baseline. AI

影响 Introduces a new attention mechanism that could offer a different trade-off between accuracy and efficiency in Transformer models.

排序理由 This is a research paper detailing a new attention mechanism for Transformers. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

Researchers propose Gaussian Kernel Attention as a projection-free alternative to standard Transformer…

报道来源 [1]

  1. arXiv cs.LG TIER_1 English(EN) · Debarshi Kundu, Archisman Ghosh, Swaroop Ghosh, Vasant Honavar ·

    Projection-Free Transformers via Gaussian Kernel Attention

    arXiv:2605.02144v1 Announce Type: new Abstract: Self-attention in Transformers is typically implemented as $\mathrm{softmax}(QK^\top/\sqrt{d})V$, where $Q=XW_Q$, $K=XW_K$, and $V=XW_V$ are learned linear projections of the input $X$. We ask whether these learned projections are n…