Researchers have introduced a new measure-theoretic framework to understand the expressive power of Transformer architectures in modeling contextual relations. This framework connects standard softmax attention to entropy-regularized optimal transport, viewing attention as a normalized affinity function. The study establishes a universal approximation theorem, demonstrating that Transformers can approximate arbitrary contextual relation rules, with the normalization method influencing the representation of these relations. AI
影响 Provides a theoretical foundation for Transformer capabilities, potentially guiding future architectural improvements.
排序理由 Academic paper introducing a new theoretical framework for understanding Transformer architectures.
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →