English(EN) Learning Theory of Transformers: Local-to-Global Approximation via Softmax Partition of Unity

Transformer学习理论通过Softmax近似解释

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-09 09:02

研究人员开发了一个新的理论框架来理解Transformer网络如何学习回归任务。他们的方法使用“Softmax单位分割”来组合局部函数近似，利用注意力机制进行空间定位。研究表明，仅有两个编码器块的Transformer可以对某些连续函数实现统一的近似误差，从而获得接近minimax最优的泛化误差界限。 AI

影响为理解Transformer在回归任务中的能力提供了理论基础，可能指导未来的架构改进。

排序理由详细介绍机器学习理论进展的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv stat.ML 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv stat.ML TIER_1 English(EN) · Wenjing Liao · 2026-05-09 09:02

Transformer学习理论：通过软最大值统一分割实现局部到全局的近似

This paper investigates the learning theory of Transformer networks for regression tasks on the compact Euclidean domain $[0,1]^d$ and $d$-dimensional compact Riemannian manifolds. We propose a novel constructive approximation framework for Transformers that builds local approxim…

报道来源 [1]

Transformer学习理论：通过软最大值统一分割实现局部到全局的近似

相关实体

相关话题